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IN THE CONTEXT OF PRESENTING A FRAMEWORK FOR PLANNING AND 
DEVELOPMENT OF MANAGEMENT INFORMATION SYSTEMS 


FUNCTIONAL REQUIREMENTS, Such documents and its 
amendments should always reflect the current sta- 
tement of WHAT is to be done by the system. 


The functional requirements define the constraints 
placed on the system by its users. The DATA REQUI- 
REMENTS, the DATA VOLUMES, and the RATE OF PRO- 
CESSING are constraints imposed by the immediate 
users. The constraints of more remote users are 
imposed through the specification of INTERFACES 
with related systems. , 


For better understanding of the concept of functional speci- 
fications, compare it with the author's concept of NON- 
FUNCTIONAL specification: it reflects the hardware and soft- 
ware characteristics of the method of system implementation. 
The author develops a system definition based oh a “black 
box"concept of a system. The definition of system then con- 
sists among other things of defining the INPUT DATA. 


INPUT DATA DEFINITION includes specifying: 

- Where they come from 

- What FORM they are in, and 

+ Who is responsible for their PRODUCTION 

- Furthermore the definition may include the 
clerical procedure for transcription of a docu- 
ment into machine readable input at its place of 
origin, the method of transferring data between 
locations, and the clerical procedure for pro- 
ducing subsidiary source documents if for exam- 
ple data are gathered from a number of source 
documents, 


In discussing the data base as one of the technological 
elements of a management information system, the author 
considers the issue of the "cost-value relationship". 


The COST-VALUE RELATIONSHIP must be applied by 
the user to his analysis of requirements concerning 
- The DEGREE OF DETAIL 
- The AGE OF DATA 
- The ease of retrieval, and 
- The variety in formats maintained by his system. 


As a methodological background to his concept of system, 
the author undertakes a synthesis of Jay Forrester's concepts 
of information-decision-action, Herbert Simon's programmed- 
non-programmed decisions, and Robert Anthony's hierarchy of 
planning and control. This results in the following defini- 
tions: 

- A DATUM is an uninterpreted raw statement of fact, 


- INFORMATION is DATA recorded, classified, organi- 
zed, related or interpreted within context to 
convey meaning, 
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F.J, Carr (1970) 
IN THE CONTEXT OF URBAN STATISTICS AND THEIR TREATMENT 
AND USE FOR DECISION MAKERS 


Urban statistics includes all observations made by 
the public, semipublic and private organizations. 
The reasons for collecting the data are because of 
legal requirements, administrative needs or to 
facilitate decision-making. It appears that very 
little of the data recorded is, in fact, collected 
for decision-making purposes. This is an important 
fact. 


The characteristics of the data systems suggest 

that most DATA ERRORS occur at the time the observa- 
tion is made and that there is no significant 
ACCURACY DETERIORATION after recording. The 
RELIABILITY of data, however, is good - i.e. most 
data tends to be CONSISTENT from one reporting 
period to the next. 
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( Casual ) Document (1964) 


IN THE CONTEXT OF A STUDY ON THE COST AND VALUE 
OF INFORMATION 


VALUE of information is most certainly tied to 
those familiar standards of ACCURACY and TIMELINESS. 
While well-known as clichés, they are, nevertheless 
also difficult to formulate. 

ACCURACY, for example, may be merely spurious, tied 
to some degree of precision more apparent than real. 
There are cases where penny bookkeeping can give 
way to dollar amounts and truncated figures, proba- 
bly with little loss in the essential MEANING and 
ACCURACY. Conversely, there are numerical methods 
which give entirely meaningless results because 

all PRECISION has vanished at the level of single 
length floating point computation. Approximate 
answers serve satisfactorily for many problems, 
while being inefficient for others. Building a 
system to obtain more accuracy may encounter 
additional costs with questionable improvements 

in value. 


TIMELINESS of information is a complex function 
of the time period for which the information is 
gathered (interval) and the waiting time until it 
becomes available (delay). 


DEPENDABILITY of information is an element of the 
value of the information and contains the statisti- 
cal concept of STANDARD DEVIATION. More than 
PRECISION or AMOUNT OF DETAIL involved, dependability 
implies a system of BUILT-IN CHECKS from data - 
gathering, through data-processing (via validity 

and parity hardware), to data-recording, along 

with sound sampling techniques to insure that 
information is ultimately portrayed for conclusions 
with a high DEGREE OF CONFIDENCE, 


~ 
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IN THE CONTEXT OF A STUDY FOR THE DEVELOPMENT OF 
A CORPORATE PRODUCT INFORMATION SYSTEM 


The Product Information System processes the 
information that is required to develop, market, 
build, schedule and maintain the company's product 
line. 


iT 
The fundamental objective of any product informa- 
tion system is to provide to the operating func- 
tiohs of the ‘business ACCURATE AND TIMELY informa- 
tion required to perform their tasks at a minimum 
cost. 


System performance should be monitored against 
objectives and an evaluation should be donc of 
the financial returns. 


The performance of an information svstem is 
measured in terms of thruput capacity, TIMRPLINNSS,. 
CYCLE TIME, ACCURACY, cost per unit of information, 
ease of use, etc. Further, each of these factors 
interacts with the others, e.g. ACCURACY of infor- 
mation is directly related to its TIMELINESS, 
Fragmentation of the information system into sub- 
systems contained within organizational divisions 
makes correlation of these factors difficult and 
financial understanding of the operation of the 
system almost impossible. 


(Casual ) Document (1970) 


IN THE CONTEXT Of FOLLOWING UP THE DEVELOPMENT AND 
PREPARING FOR THE INSTALLATION OF A CORPORATE INFORMA- 
TION SYSTEM 


The progress of the project of designing the cor- 
porate information system showed that the data 
bank has come to be recognized as being one of the 
most important parts of the system. 


In parallel with this recognition it has become 
abundantly clear that the INTEGRITY OF THE DATA 

in the data bank, and the operational problems 
associated with the MAINTENANCE OF THIS INTEGRITY 
are going to be of major importance to the success 
of the overall system. The result of these insights 
is the evolution of the concept of DATA MANAGEMENT. 


DATA MANAGEMENT is now a concept associated with 
the following activities which will ensure the 
continuing ACCURACY and INTEGRITY of the data 
bank: 
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1. DATA SPECIFICATION, for the documentation and 
control of all data codes, data elements, records, 
files, transactions, messages, and reports. 

2. GENERALIZED INFORMATION RETRI#VAL, raising the 
problem of data security. 

3. DATA SECURITY, requiring safety-dumps procedures 
and policy for protection of vital records. 

4, FILE CLEAN-UP based on VALIDITY CHECKING of the 
data. Continuing DATA-BANK INTEGRITY, after ini- 
tial clean-up will be based on CRITERIA FOR THE 
ACCEPTANCE OF DATA as well as on SAMPLING PROCE- 
DURES, by which Data Management will be able to 
accept or reject the addition of a new system or 
of a system-extension in an on-line environment. 


The paper goes on listing other activities of minor impor- 
tance for our issue, such as: data bank layout and creation, 
file reorganization, and forecasting/allocation of storage 
space. The paper later states that the Data Management 
activities will be allocated among: 


- LOGICAL Data Management, controlling e.g. the 
INTEGRITY of the data bank against data-specifi- 
cations. 

- ADMINISTRATION of Data Management, administering 
SECURITY procedures, documenting security viola- 
tions and DATA ERRORS, and gathering data-bank 
statistics. 

- TECHNICAL Data Management, controlling FILE 
CLEAN-UP and back-up procedures. 


In a discussion of future organization and staffing of 

Data Management, the paper suggests a split of its responsi- 
bilities, allocating a part of them to the the company 
functions going under the names of:Technical Support (to 
Data Processing), Data Processing, Applications Development, 
and the "USERS". 


Eventually the paper states that other concepts exist in 
close association with Data Management, (on which we have 
concentrated up to now) : 


SYSTEM INTEGRITY - Analyzes e.g. the data-flow 
within a divisional location, considers environmen- 
tal constraints, develops and issues philosovhies 
for the design of information systems, and controls 
the INTEGRITY of the information system and of the 
data bank. 


PLANNING AND CONTROL - Analyzes e.g. already instal- 
led local systems for compatibility,etc., develops 
installation plan for hardware, software and appli- 
cations, and controls system costs and SYSTEM 
PERFORMANCE. 
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IN THE CONTEXT OF AUDITING EDP SYSTEMS 


In addition to evaluating the internal control 
of an EDP system, the auditor must evaluate the 
REASONABLENESS of those records produced by the 
system, which relate to the EXISTENCE and proper 
VALUATION of assets, liabilities, equities, and 
transactions. 


Computer audit programs can assist in the 
performance of auditing procedures such as: 


- Selection of EXCEPTIONAL transactions and 
accounts for examination. 


- COMPARISON of data for CORRECTNESS AND 
CONSISTENCY. 


- CHECKING of information obtained directly by 
the auditor, with company records. 


- Performance of arithmetic and clerical 
functions. 


- Preparation of confirmations. 


EDP Analyzer (February 1968) 


IN THE CONTEXT OF USE OF DATA MANAGEMENT SYSTEMS 


Unstructured reporting systems used for management 
control will be at the mercy of the QUALITY of the 
data stored in the data files. In structured data 
systems, experience from use has led to the esta- 
blishment of the necessary data quality controls. 
Data of secondary interest, that does not appear 
in the structured reports, generally is not con- 
trolled - and therefore might have a high ERROR 
CONTENT. Such data could affect the unstructured 
system, 


The following are given as some of the major causes of 
POOR DATA: 


ERRONEOUS DATA, including INCORRECT CODING of 
classification fields and WRONG INPUT of quantity 
fields. 


MISSING DATA - transactions not entered 


EVENTS THAT DO NOT CONFORM TO POLICY, but recording 
of these events is forced to fit existing data 
recording structures. 


Important fields normally NOT RECORDED FORMALLY ; 
hard to control their quality when input to system. 


The TIME an event occurs may differ from its planned 
time of occurrence; it may be either early or late; 
may result in an apparent deviation from vhe plan 
that really has little meaning. 


Different organizational units may have different 
INTERPRETATIONS of the TIMING of an event; one 
"date of transaction" may not satisfy all users. 


An example of the fourth cause above may be taken from a 
department store stock control where dollar inventory records 
are normally kept by class of merchandise. While it might 

be desirable to have actual stock inventory records by units 
of merchandise, it usually hasn't been economical to do so. 
Whatever the sales clerk records about the class of merchan- 
dise sold is used for updating of inventory records with no 
way to insure good accuracy of the class number, 


Unfortunately, no examples are given of the very interesting 
case of events that do not conform to policy, being forced 
to fit existing data recording structures. 
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IN THE CONTEXT OF EVALUATING THE COST-EFFECTIVENESS 
OF MILITARY COMMAND AND CONTROL SYSTEMS 


A military command and control system may be seen 
as composed by subsystems for data-gathering or 
reporting, analysis, and transmission or promulga- 
tion of orders. 


The relation of the first and of the third of the above 
subsystems to the issue of quality of information will be 
reviewed below. Prior to this, the author states that the 
ACCURACY of a cost estimate for a new control system depends 
upon: 

1. The value of performance 

2. The ACCURACY of the system, i.e. how well the 

function to be performed has been defined 
3. The performance level desired. 


In the context of the DATA GATHERING OR REPORTING SUBSYSTEM, 
the author argues that its major performance factors are 
timeliness, accuracy and reliability. 


TIMELINESS. How much is it worth to have the data 

a day, hour, five minutes or sooner ? Given a spe- 
cific data requirement, it is probably possible for 
an experienced military commander to put an arbitra 
ry (approximate) value on the timeliness of the 
data. 


ACCURACY. How much is accuracy worth in a data ~- 
collection system ? This again is dependent 

upon the nature of the system, of the situation 

and of the data, but also on the ACCURACY OF THE 
RAW DATA and the quantity of the data. Given a spe- 
cific requirement for the data, arbitrary and ap- 
proximate values can be assigned by the commander. 
It is not possible to do this in the abstract. 

(The ACCURACY OF THE SYSTEM could be defined as the 
percentage of the data entered into the system 
which arrives UNCHANGED at the output of the data- 
collection system). 


RELIABILITY could be defined as the percentage of 
the time that the system is performing in its nor- 
mal manner. 


Certain types of command situations permit a relati 
vely ACCURATE and profitable assessment of the 
value of timeliness, accuracy and reliability. 
Consider the case of a moving target with a known 
top speed. Knowledge of the EXACT PRESENT LOCATION 
is limited by the speed, accuracy and reliability 
of the reporting subsystem. If we don't know of any 
restraints on its direction of travel, we must 
assume the target has a certain probability of 
being within a circle whose radius is determined 
by its speed and the AGE AND QUALITY of our know- 
ledge of its last position. 
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If we assume for simplicity that we have an ACCURATS, 
reliable delivery system and a certain racius of ‘ill, 
we can calculate the number of weapons which must he 
applied to the target area to give a desired probability 
of destroying the target. 


the number of weapons foes up as 
a function greater than, but asymptotic to, the square 
of the LINEAR UNCERTSINTY as to the location of the 
target. This uncertainty includes, when you are estima- 
ting the number of weapons t9 stock: 


According to a model, 


The reporting ACCURACY 

(Speed of the target) x (Probable reporting time loss) 
. A safety factor for the fact that the information you 
have may be older than you think (reliability of the 
reporting subsystem) . 


Woe 


In the context of the ORDER TRANSMISSION SURSYSTEN, 
states that 


the author 


ACCURACY is extremely important for the improved perfor- 
mance of each subsystem, RELIABILITY, i.e, the probabi- 
lity that the command will be delivered, is also of great 
value. The value of speed may be dependent in part unon 
the response time of the force cormanded. Values can also 
be assigne’ to degrees of reliability and accuracy. 


W. Edwards et al. 


IN THE CONTEXT OF PROBARTLISTIC INFORMATION PROCESSING SYSTEMS 


Probabilistic information processing systems emhody ideas which 


are relevant to any setting 
tant, 
settings the clecision-maker 


including governmental and business settings. 


in which formal diagnosis is imnor- 
In all such 
must face uncertainty and he typi- 


little information. Much of the 
with uncertainty by providing de- 
more information. Unfortunately, 
more information is not the comnlete answer. Some way of pro- 
viding better information would be ideal - a military cecmmander 


would be delighted to know his opponent's battle plans. 


cally feels that he has too 
effort was aimed at dealing 
cision-makers with more and 


But BETTER INFOXMATION is often not available. ABUNDANT and 
often ACCURATE information about questions only periphorically 
related to what the <lecision-maker really wants to know must 
somehow substitute. THE PROBLEM OF DIAGHOSIS IS TN LakGk PART 
THAT OF MAKING QUANTITY QF INFORMATION SUSSTITUTE FOR QUALTTY. 





If people estimate likelihood ratios for each datum and each 
pair of hypotheses under consideration or a sufficient subset 
of these pairs, a computer can subsequently ageregate these 
estimates,by means of Bayes' theorem of probability theory, 
int» a posterior distribution that reflects the imnact of all 
available data on all hypotheses being considered. This cir- 
cunvents human conservatism in information precessing, that is, 
human inability to aggregate information in such a way as to 
modify own opinions as much as the available data justify. 
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IN THE CONTEXT OF THE ECONOMICS OF INFORMATION IN 
ORGANIZATIONAL PLANNING AND CONTROL SYSTEMS 


In a formal model, one can through a process of selec 
tively varying INPUT DATA over the estimated range 
of possible values, identify those variables that 
are critical in determining pay-off. Effort can then 
be spent on refining the estimates of the variable: 
but if the costs of such REFINEMENT or the INHERENT 
STATISTICAL VARIABILITY in a process preclude narrow 
ing the range of the estimate to within the region 
of relative insensitivity for the variable in ques- 
tion, one might better try to make structural chan- 
ges in the physical process (e.g. production pro- 
cess) being modeled, rather than try to improve 
FORECAST ACCURACY. 


In the absence of quantitative estimates of INFOP- 
MATION VALUE, design decisions in develoving orga- 
nizational information systems must be guided by 
QUALITATIVE CHARACT®RISTICS OF INFORMATION that 
govern both its value and its cost. We speak then 
of approaches that require a lower degree of forma- 
lization, 


ACCURACY and RESPONSE TIME may be seen as two of the 
quality characteristics that determine the VALUE 
and the COST of information, 


QUALITY CHARACTERISTICS WHICH DETERMINE THE 

VALUE OF INFORMATION 
RESPONSE TIME can be defined as the time interval 
required to perform an information processing 
operation: updating of a record or the retrieval 
of the data. Reducing the time interval to update 
a record means that the data base provides a more 
CURRENT VIEW of nature: if the planning horizon 
extends only a short time into the future and if 
nature is quite uncertain so that any prediction 
about the future is subject to rapid decay, the 
reduced updating time (or more generally a reduced 
processing time lag) means a significantly shorter 
prediction span and increases the ACCURACY in 
estimating (predicting) the future state of planning 
variables over the planning horizon. 


ACCURACY. In the case of decision processes that 
deal with unaggregated data, the VALUE of informa- 
tion may be highly sensitive to ERRORS, (e.g. an 
error in a bank account balance may be very expensi- 
ve). When data are aggregated for high-level deci- 
sions (such as an analysis of bank deposits by dis- 
tricts) the VALUE OF GREAT ACCURACY drops off 
sharply. 
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Accuracy refers not only to the DEGREF TO “HUTCH SENSFR 
CNFORMATTION CORRESPONDS TO THE ENTITY TT PURPORTS TO 
MEASURE; tt also applies to the DEGREE TO Witch ° 
PREDICTED VALUE (such as sales forecast) CORRESPONDS 
TO THE EVENTUAL ACTUAL VALUE, 


Uf the values over time of a given variable exhibit 
some stability (e.g. if the current rate of sales is 
related to previous rates), RANDOM ERRORS in sensing 
or prediction can be reduced by "smoothing" the cata 
through an averaging process. Increasing the time span 
over which data are averaged reduces the random com- 
ponent of the resulting average at the expense of 
reducing its RECENCY (dealying its availability). Thus 
a trade-off often exists between ACCURACY AND 2ECENCY, 


QUALITY CHARACTERISTICS AS THEY AFFECT 
THE COST OF INFORMATION 


RESPONSE TIME costs are related to computation costs 
(batched or random processing of transactions) and to 
data transmission costs. 


ACCURACY. Almost any degree of PERFECTION can be achic- 
ved, but costs tend to rise very steeply as perfection 
is approached, Accuracy is achieved primarily through 
REDUNDANCY, DUPLICATION, CHECK DIGITS, REASONABLENESS 
CHECKS, VALIDITY CHECKS; all these ERROR-CONTROL TRCH- 
NIQUES rely ultimately on some form of redundancy, and 
all cost money in the form of extra data-collection, 
transmission, storage or processing. 


QUALITY AS DISCUSS"D IN THE CONTEXT 
OF DATA-MANAGEMENT 


In order to keep the data base a faithful image of 
reality, the data-management function must maintain 
the VALIDITY of the data entering the system, 


Typically, the data base already contains considerable 
prior information about input data: their format, 
allowed charncter mode (e.g. alphabetic or numeric), 
and the set or range of permitted values. The input 
data are thus partially redundant. THIS PROVIDES A 
MEANS TO TEST FOR VALIDITY. If the input data meets 
all checks as to FORMAT, RANGE, and so forth, they 

are assumed to be valid. Validity checks can then 
screen out many common errors and can usually call 
into question a "large" error. A "small" error is 

much more difficult to identify, but failure to detect 
it often results in relatively minor consequences. 
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IN THE CONTEXT OF AUDITING AND OF MANAGEMENT CONTROL 
OF ELECTRONIC DATA PROCESSING 


In considering the entire business organization, 
the controls which management uses to accomplish 
its objectives may be described as 

"the plan of organization and all of the conr- 
dinate methods and measures adopted within a busi- 
ness to safeguard its assets, check the ACCURACY 
and RELIABILITY of its data, promote operational 
efficiency, and encourage adherence to prescribed 
managerial policies." 


This broad concept of control applies to any fun- 
ection in an organization, including an EDP system. 
In terms of the EDP system itself, however, controls 
may be described as 

"a plan to ensure that only VALID data is accep- 
ted and processed, COMPLETELY and ACCURATELY, and 
that necessary information and records are provided". 


The auhtors go on developing the meaning of several of the 
terms used in the statements above. 


VALID means CORRECT and AUTHORIZED 


COMPLETELY means "remaining intact throughout pro- 
cessing, and being fully processed through all 
appropriate computer operations". 


ACCURATELY means "without undetected ERRORS", 

It means further, that processing FULLY ACCOMPLI- 
SHES ITS PURPOSE and is in accordance with manage- 
ment's policies and instructions. 


NECESSARY INFORMATION means "data reported by the 
EDP system both for operating purposes and for com- 
parison with related data available from within the 
EDP system or external to it for the purpose of 
proving the COMPLETENESS and ACCURACY of the pro- 
cessing and identifying exceptions thereto". 


RECORDS means "an information trail and retrievable 
data storage adequate for the reconstruction (if 
necessary) of current records either for future pro- 
cessing or to meet the information requirements of 
management, customers, auditors, Internal Revenue 
Service, and other outside agencies". 


By incorporating control-providing procedures in 

an EDP system, not only will the system possess 

a high degree of RELIABILITY, but also the ACCURACY 
and ORDERLINESS which result will lead to greater 
processing EFFICIENCY by reducing the number of 
ERRORS that require manual intervention and repro- 
cessing. Another advantage to be derived from ac- 
complishing the control objectives concerns the 
risk of loss through INTERNAL FRAUD. 
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IN THE CONTEXT OF AN INTRODUCTION TO 
"DATA MANAGEMENT" 


DATA MANAGEMENT is the control, retrieval, and 
storage of information to be processed by a compu- 
ter. Fach of these three areas of data management 
is an essential function of any information system. 


The paper goes on defining and discussing each of the three 
concepts above. We shall concentrate our attention on 
"control" since it most closely affects the aspects of 
information-quality. 


CONTROL is the authorization and supervision of the 
data management process. AUTHORIZATION IS THE 
VALIDATION of a user's right to access or modify 
the information in the system. SUPERVISION includes 
monitoring the location of information, insuring 
against data loss (DATA INTEGRITY) and insuring 
that the information in the system is CURRENT. 


In the above context, INFORMATION is defined as 
ideas and FACTS about ENTITIES such as people, 
places, machines, etc. Information about entities 
is composed of: 


1. CONTEXT defined by the characteristics of 
an entity, also called information ATTRIRUTES. 
For people they are e.g. Name, Address, Social 
Security Number etc. 

2. DATA, which is represented by DATA VALUES, 
e.g."John Smith"for the attribute "Name" 

3. DATA REPRESENTATION, which is represented by 
DATA ATTRIBUTES (e.g. "20 Alpha Characters") 


ar 


It is the function of Data Management to build 
MEANINGFUL INFORMATION by bringing together the 
PROPER context, data, and data representation. 


An Information System is a system that controls, 
maintains and provides concurrent access to a pool 
of information for AN IDENTIFIABLE SET OF USERS, 

One of the advantages of an information system is 
that it makes possible DATA CONSISTENCY: access to 
data can be limited to those users capable of using 
it correctly. Because the system processes each field 
it can also check to see IF THE VALUE OF THE FIELD 
IS VALID AND REASONABLE. However, even if the system 
can provide REASONABLENESS CHECKS, it cannot be 
responsible for the ABSOLUTE VALUE OF THE DATA. 


System knowledge of context IS THE MOST IMPORTANT 
DESIGN CRITERIA OF AN INFORMATION SYSTEM. Another 
requirement or criterium is the SECURITY AND INTE- 
GRITY OF DATA, i.e. protection against accidental, 
inadvertent loss or destruction and INACCURACY of 
sensitive data (DATA INTEGRITY) and protection aga- 
inst unauthorized access (DATA SECURITY). Equally 
important as prevention is the detection and correc- 
tion of events violating security and integrity. 
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R.H. Lauren (1970) 


IN THE CONTEXT OF RELIABILITY OF DATA BANK RECORDS 


The problem of RELIABILITY is the problem of 
insuring ahd maintaining the ACCURACY of informa- 
tion contained in data banks, regardless of who 
has access to the data or whether the information 
is private or public. 


In regard to reliability, two specific areas 


are identifiable for concentrated effort in the 
future: 


- The problem of existing files.- How to 
CLEAN UP them to meet whatever STANDARDS will 
be ACCEPTABLE. 


+- How to increase the areas of CONTROLLABILITY 
for the input of information. 
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H.G, Lundin & B. Sundgren (1969) 
IN THE CONTEXT OF A DEBATE ON PUBLIC DATA-BANKS 
AND NATIONAL INFORMATION CENTERS 


In order to define the risks and resnonsibilities implied in 
the design and operation of data-banks, the authors use in the 
above context a matrix in order to visualize the interactinns 
or consistencies among the goals-desires emanating from the 
government, the citizen as an individual, and organizations 
such as business firms, newspapers and political parties. 
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In the "conflict matrix" above, the sign "+" at row 10 and 
column O1 shows that goal O1 has goal 10 as precedent, that 

is, the possibilities for follow-up control are improved by 

the contribution of detailed information on others (citizens- 
individuals and business firms). Slank positions stand for 
neutrality or independence, The goal-numbers mean the following: 


OL - Possibilities to follow-up the imnlementation of laws 
such as on taxation and military service 
02 - Basis for social planning 
03 - Imposition, obligation to report to the data-bank in order 


to puarantee"automatic" flow of updatings 
BPh =: SN hey a aie 
05 - Legal security for the individual 
06 - Low reporting effort, respect for the citizen's time 
07 - Integrity, protection against discrimination 
08 - Follow-up of right to social benefits 


09 - Market information like addresses of possible customers etc. 
10 - Detailed information on citizens, other organizations, com- 
petitors, etc. 


The matrix proposes that high-quality is a desire emanating from 
the government. It gives positive contribution to all other go- 
als except for the individual's goals 06 and 07 above. Further- 
more, high quality is supnorted by (receives positive contribu- 
tion from) goal 03, is opposed by goals 06 and 07, and is neutral- 
ly preceded by all others. 
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The authors go on using the matrix in order to roughly summari- 
ze the overall conflict or consistency of overall goals between 
government, citizen, and organizations. This is done hy noting 
whether the sign "+" or "-" is predominant in each "sector" of 
the matrix above. This leads to the following sector-matrix: 


Gov. Cit. Org. 


Gov. + - + 
Cit. - + - 
Org. + - + 


The authors suggest that the commonness of interests between 
government and organizations, and their conflict with the indi- 
vidual citizen's interests especially 06 and 07 require the 
set-up of official parliamentary controls. 


In spite of HIGH QUALITY playing a role in the authors' approach, 
the term is not defined and an explicit justification is not 
given for its inclusion among the GOVERNMENT'S gsoals. 


Two other authors, however, B.Hansen and A.Rickardsson have 
used the same matrix-apyroach in the context of an undergradua- 
te paper presented year 1970 at the Royal Institute of Technolo- 
gy of Stockholm, Dept. of Information Processing. They analyze 
the goals of an official public data-bank on the country's bu- 
siness organizations, and they suggest that HIGH QUALITY of 
data is 

- HIGH CURRENCY (i.e. low"age") 

- CORRECT CONTENT 

- COMPLETE COVERAGE 


The coverage of the target population is seen to be incomplete 


to the extent that there are no possibilities to add new sources 
in the systems design. 


The correctness of the information is seen as the result of 
proper COVERAGE and IveNTIFICATION of the target population. 

As in the two previous statements, the definitions are not ex- 
plicitly given but they are in our own oninion rather implied 
by the text. What we called correctness in the third statement 
corresnonds to "satisfactory presentation of results (satis- 
factory from all points of view) to future consumers of statis- 
tics," 
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IN THE CONTEXT OF A THEORETICAL ANALYSIS OF ERRORS AND THEIR 
CONSEQUENCES IN AN INTEGRATED CONTROL SYSTEM 


The authors develop some definitions of error based on the 
following: 


Consider a number of input-elements Xi, which 
undergo a process Fi, and give a result-element 
¥is 


One can thus write Yi = Fi (Xi). 


By ERROR, in this context it is meant that 
Yi 4 Fd (Xi) for at least one i, where Fd 
stands for the DESIRED, i.e. the "RIGHT" 
process. One can therefore also write the 
definition of ERROR as 


Fi #4 Fd 


since the input~elements must be regarded as 
neutral from the viewpoint of the considered 
process. 


An extension of the above definition can be applied 
to defining 


RANDOM ERROR = The consequences of Fi not being 
identical to Fd for randomly distributed i 


SYSTEMATIC ERROR = The consequences of Ft not 
being equal to F t+1, and F t+l is right. (t is 
a time index). 


THE PROBLEM OF DETERMINING WHETHER THERE IS SOME 
ERROR HAS NOW BEEN TRANSLATED TO THE PROBLEM OF 
DETERMINING WHETHER Fi is right, i.e. WHETHER 

Fi = Fd. 


In order to be able to start a system at all we 
must commit ourselves to a Fd on the basis of 
experience, and assume that it is RIGHT: sometimes 
we must terminate the search for the absolute 
TRUTH and start the system. Our assumption that 
the selected Fd is "RIGHT" does not actually imply 
that ERROR-CONTROLS are unnecessary - we have 
only prescribed a standard. 


Eventually the authors consider the error-thinking suggested 
by numerical analysis: Input-element (the number) is equal 
to the result-element (the measured value + error). They 
state that such understanding of error is obviously better 
in the case of continuous variables, but it is not adequate 
to illustrate e.g. keypunch-errors. They state that the 
former concept of error can be translated to their proposed 
"right/wrong" concept by establishing control limits (error 
limits). 
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Orlicky (1969) 


IN THE CONTEXT OF INPUT DATA INTEGRITY AS ONE ASPECT 
OF SYSTEM OPERATION 


The computer system functions with full success 
only in a "perfect" environment, which would inclu- 
de ERROR-FREE, COMPLETE, and TIMELY data. When data 
lack INTEGRITY, a computer system tends to fail. 
The seriousness of the consequences will vary with 
the application. It may be minor where the computer 
is used as an analytical tool or rapid-fire calcu- 
lator. In these cases, resulting outputs are used 
for evaluation or as an intermediate step within 
some larger function, but they do not reflect ope- 
rating decisions. 


In computer-based operating support systems, how- 
ever, many such decisions are programmed for the 
computer to make and low quality input data heavily 
cohtribute to failures with far-reaching consequen- 
ces! 


The QUALITY of input data varies with their source. 
Accounting data are, as a rule, the most ERROR-FREE 
followed by engineering, purchasing, production con- 
trol, and marketing data, in roughly that order. 

The incidence of error is always" highest in the 
labor and production data being generated in facto- 
ry operations, particularly where production workers 
themselves report (by whatever means) their activi- 
ties to the system. 


INPUT DATA INTEGRITY results from education, disci- 

pline, system checks, and the capability to investi- 

gate and correct. System checks against input errors 

may be classified as 

1. The barrier or filter, i.e. programmed or manual 
capability to detect and reject incorrect trans- 
actions at the point of entry, by means of self- 
checking digits or diagnostic routines for com- 
parison with other files. 

2. Internal detection by checks made against the 
file being updated, 

3. Washing out residues, i.e. detecting and removing 
the effects of undetected errors by reconciliation, 
purging and close-out procedures. 


The author sees FILE or DATA BASE INTEGRITY as distinguished 
from the above mentioned input data integrity: 


A single change of e.g. departmental boundaries in 
a manufacturing plant, may "explode" throughout a 
routing file calling for thousands of revisions. 
This problem must be met by adequate staffing and 
budget for FILE MAINTENANCE 


Among aspects of SYSTEM DEVELOPMENT, the author mentions 
FILE CLEAN-UP during conversion to new format. Such conver- 
sion should then include AUDITS FOR ACCURACY. 
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S. Owsowitz & A. Sweetland (1965) 


IN THE CONTEXT OF A STUDY OF FACTORS 
WHICH AFFECT CODING ERRORS 


Information processing generally begins with making 
observations and recording them. Under modern infor- 
mation processing they are then keypunched. From 
this point on, the major part of processing is done 
by machinery which is almost ERROR-FREE. The errors 
occur in the inputs: the recording and keypunching. 


1. As a first approach, to date, the major effort 
in solving the ERROR PROBLEM has gone toward 
DETECTING ertors in the document themselves. 


2, A second approach is to CONTROL error instead 
of eliminating it. The statistical methods used 
to randomize and balance error are a simple illus 
tration of cotitrol, as in the computation of 
fiducial limits. Another way of controlling error 
is to reconstruct the erroneous information to 
yield a TRUE record. 


3. A third approach is ERROR PREVENTION. This might 
be called "designing" human-factor elements into 
data-processing systems, in order to make the 
coding situation as error-free as possible. 


The authors consider the third approach as a way of impro- 
ving the VALIDITY OF THE DESCRIPTION of a system. They do 
so by concentrating the study on the coding-keypunching 
sequence of the overall coding process. They define these 
latter terms in the following way. 


Given that a component (system, black-box, bit or 
piece etc.) is in a status that can be described 
and coded, and given a sufficient and adequate code, 
the CODING PROCESS can be subdivided into a number 
of steps: 


1. The human observer examines the component and 
judges what its status is. 

2. Referring to his manual, he finds a word or 
phrase that describes his judgement. 

3. After finding the APPROPRIATE description he 
enters the corresponding code on the form. 

4. The form is reviewed by one or more people who 
may make corrections. 

5. The form is keypunched and verified. 


The series of steps 2. to 5. of the overall coding 
process above is what was previously referred to 
as the CODING-KEYPUNCHING SEQUENCE, 


The authors state that if the description keypunched on the 
card ACCURATELY describes the status of the component, then 
the description is VALID. If the system CONSISTENTLY records 
the TRUE statuses of a large number of components, then the 
system is a VALID recording mechanism. Thus, the validity of 
a system is vulnerable at a number of places. The reported 
study tries to answer the question: "what kinds of coding 
reduce the validity at the coding-keypunching sequence ?", 
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G. Rodin (1971) 
IN THE CONTEXT OF DESIGN AND USE OF DATA BANKS 
FOR REAL-TIME SYSTEMS 


DATA QUALITY is a measure of the deviation of the 
data from the IDEAL value. Quality may be further 
subdivided in four groups: 


COMPLETENESS means that all information that 
should exist actually exists in the data bank. 
The concept also includes the requirement that 
there is no unnecessary, superfluous information 
stored in the bank. 


PRECISION and declaration of the degree of preci- 
sion is only of interest in the case of continuous 
variables like when specifving the width of a 
road: the data may be of no use if the PRECISTON, 
i.e. the ERROR LIMITS are not known. Precision 

is particularly important if several users will 
have access to the information: the precision must 
then be good enough for the requirement of all 
users. For future requirements it is also necessa- 
ry to specify how good the quality is. 


CORRECTNESS. For most kinds of data which are 
stored in public information systems it can be said 
that they are either RIGHT or WRONG, e.g. birth 
date, social security number or marital status. 

For other continuous variables like e.g. tempera- 
ture,the correctness may be affected by two types 
of errors: VALIDITY ERRORS when not measuring what 
is believed to be measured, and RELIABILITY FRROR 
of the measured value itself. For instance a vali- 
dity error is made if one tries to establish the 
position of a house by measurements on a map that 
only shows the limits of the lot on which the house 
is built, and it is assumed that the house lies 

at the "analytical: centroid" of the lot surface. 
The reliability of the measurement data is deter- 
mined by the PRECISION with which this analytical- 
centroid is measured.The reliability is then 
depending upon the precision: if all values fall 
within the error limits, the reliability is said 

to be great. 


CURRENCY. In the course of time, depending upon 
updating procedures, different data become of diffe 
rent age. In certain statistical applications it is 
important to have information on the age of data. 


The author goes on discussing as a separate point the 
issue of DATA SECURITY: 
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Security of a data bank system means: 


- Protection against disturbances (interruptions) 
o£ system operation. 

- Protection of data against loss of data, change 
of data and particularly against UNAUTHORIZED 
CHANGE AND DISSEMINATION OF DATA (SECRECY). 

The latter is to be regarded as a necessary con- 
dition of high quality of data. 


The same author also discusses METHODS FOR OBTAINING 
HIGH DATA-QUALITY: 


There are possibilities for checks of inputs both 
inside and outside the computer system. The outside 
check may consist of verifying that CODING IS 
CORKRECT by requiring double input of the same data, 
possibly coded by two different people and input 

by two different people. Furthermore the system may 
be programmed to respond to the first input by 
requesting a confirmation and stating the importan- 
ce that the particular input be absolutely right. 
The system may also furnish at some print terminal 
a hard copy of the on-line input for proper visual 
check against the original documentation, 

The inside checks in the computer consist of the 
well known REASONABLENESS OR LIMITS AND VALIDITY 
TESTS. 


QUALITY CONTROL OF THE D,.TA in the data bank 

may be performed on a continuous basis e.g. by 
means of sampling followed by the above mentioned 
types of checks. Statistics about the controls may 
be later used to detect ANORMALITY IN THE QUALITY 
which may be an indication of serious quality 
problems. 


OS50LETE AND UNNECESSARY data must be regularly 
deleted, leading not only to higher quality but 
also to economy in processing time. 


One way to improve quality is to give a MEASURE OF 
QUALITY. It can be for instance a measure of some 
aspects of quality such as PRECISION and CURRENCY. 
A measure of the latter might be information about 
when the data was stored or updated the last time. 
Such measure will have to be specified and stored 
at the record or data-element level in case the 
quality is not the same for the whole data bank or 
file. Without such individualization the overall 
quality of the data bank will be determined by the 
weakest link, i.e. by the data with the lowest qua- 
lity. 


One way of checking the contents of a data bank is 
to furnish copies of the stored information to the 
inputters who have interest in its CORRECTNESS. Such 
procedure would also result in less fear or resis- 
tance against the development and use of data-hanks. 
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IN THE CONTEXT OF PRACTICAL GUIDELINES FOR THE 
DEVELOPMENT OF MANAGEMENT INFORMATION SYSTEMS 


A successful management information system is a 
system designed to provide the operational manage- 
ment with ACCURATE information upon which to make 
sound decisions. Success is the object of such a 
system. It must be management-oriented and the 
data, whether it be manual or automated, must be 
ACCURATE and available to the manager. 


The author develops the paper starting with two hypotheses 
one of which is that management information systems have 
failed because of inadequate attention to data-base construc 
tion. Prior to stating nine data-base design criteria, the 
author provides a basis of nine so-called "information 

theory statements" some of which are given here below since 
they apparently relate to the issue of quality of information. 


5. The VALUE of information varies with its 
USEFULNESS: Usefulness changes with time. The 
degree of usefulness (from "critical" to “of mar- 
ginal value") should be a prime determinant in 
choosing methods and frequency of collection, 
transmission and storage. 

6. Information use changes with age. All information 
passes through a continuance of stages of 
CURRENCY, from absolute currency, through histo- 
rical and to forgotten. The use of this data/ 
information varies with currency. 

8. The more PERTINENT the available information, the 
better the decisions. Having the CORRECT data in 
the correct place at the correct time is of pa- 
ramount importance, 

9. Most information contains some ERRORS. One of the 
paramount tasks of all gatherers of data and pro- 
cessors of information is to lower the error rate. 
Time injects errors into data, for data are con- 
stantly changing. 


And among the nine data base design criteria: 


6. The system design must ensure that the data arc 
ACCURATE, CURKENT and accessible. Information 
users quickly lose confidence in data which is 
obviously inaccurate either because of IMPROPER 
data input or because of OUTMODED data which 
should have been replaced. Accuracy may be 
checked at input, by preprocessor checks and by 
manual comparisons. The more data are used the 
more accurate they will become. The most effecti- 
ve method of data purification reamins data use. 
Currency of data is a relative quality depending 
upon the function of the system. The update cycle 
is the key to currency. 
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THE EFFECT OF THE PUNCHED CARD LAY-OUT ON THE 
QUALITY OF STATISTICS 


The following lay-outs were studied, 


A. Fixed position and fixed length 
B. Fixed length and variable order 
C, Variable length, fixed order 

D. Variable length, variable order 


In former studies on punching errors, the authors observe, 
the FREQUENCY OF INCORRECTLY PUNCHED CARDS OR CARD COLUMNS 
has been used as a quality measure. In this study, the 
above measures were insufficient, since the same type of 
punching error might affect the information items quite 
differently depending on the type of layout (e.g. if the 
digit happens to be a field tag). 


A new kind of measure related to the need of VERIFICATION 
4s required. The AMOUNT of the EXACT DEVIATION between 
the VALUE written on the form and the punched value gives 
for each individual item oti the form, a measure of the 
NEED OF VERIFICATION. However, such measure is time con- 
suming to obtain manually, and therefore the NUMBER of 
incorrect items and of digits are used as approximations 
to the amount of exact deviation. The measure of the num- 
ber of incorrect digits included all digits immediately 
to the right of the incorrectly punched one, 


The investigation then relates the two new suggested 
measures to the total number of items and digits. In com- 
parison with the measures conventionally used, similar 
measures were included - the number of punched cards with 
incorrect values and the number of punch errors committed. 
The study used field-filled forms of the Swedish Agricul- 
tural Survey in June 1964 consisting of 1340 forms with 
place for 70 items each leading to a total of 93,800 items 
out of which only 22,000 had been filled with a total of 
41,200 digits. The following table summarizes the results: 


ALL TYPES OF ERROR TYPE OF LAYOUT 
A B Cc D 


Wrong items 
- In percent of all items L,2 
- In percent of filled items 5,0 


Wrong digits 
- In percent of filled digits 5,3 1,6 11,59 5,5 


The study proved that different layouts might influence 
the quality of the statistics: in the case, B and C are 
the most respectively least favorable layouts. Moreover 
the results indicate that traditional quality measures 
are not able to discriminate between different punching 
layouts. The relative number of wrong items varied betwe- 
en 0,5 and 9,4 % for errors directly assignable to pun- 
ching layout. The corresponding relative numbers for in- 
correct digits varied between 0,5 and 9,6 %. 
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At the conceptual level, the Berglund-Larson study is 
also interesting because of the error-classification 
scheme. PuncH errors were investigated in order to diffe- 
rentiate the importance of the following influence fac- 
tors, besides the punch layout itself: 


ERRORS DUE TO THE NATURE OF ORIGINAL MATERIAL, such 
as bad handwriting, changes in the originally field- 
filled digits, and alternative forms of decimal figures. 


ERRORS DUE TO PUNCHED CARD LAYOUT such as 


IN KAYOUT A:+Displacement of item values to another 
place on the card 
+No card number or wrong card number (this 
error is also influenced by the choice of 
punched medium:card; paper or magnetic tape) 
IN LAYOUT B:-Missing or wrong item identification fox 
the item values 
-Displacement in some column (not whole field 
length) of the item value 
IN LAYOUT C:-Missing field separation character betweon 
item values 
-Too many field separators between item 
values 
-Missing or wrong card number on the card 
(this error is also influenced by the choice 
of punch medium) 
IN LAYOUT D:-Missing field separators between item values 
-Missing or wrong identity for item values 


ERRORS DUE TO MISCELLANEOUS such as 

transposed digits, wrong digits when the original was 
clearly readable, forgotten item values, wrong form 
identities and missing cards. The last kind of error 
is influenced by the choice of medium while the others 
may be related to the skill-degree of punch operators. 


ON THE NATURE OF ERRORS IN PUNCHING NUMBERS 


As referred by M.Jénsson in Mekanresultat 71008 (1971), 
12 million numbers were keyed with no specified equip- 
ment and procedures, resulting in 10,400 wrong numbers, 
i.e. 0.08 %. Analysis of the errors in terms of digit 
manipulation may be summarized in the following table; 
(average of percentages for adding and card punching 
machines) : 


- insertion of digits ; AG 
- omissions 7 % 
- single digit substitution 77 % 


- multiple digits substitutions 12 % 
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HUMAN CODE TRANSMISSION 


The experiment was set up to study in terms of information 
theory (theory of signal transmission) some aspects of 
operations where the the operator's task is simply of a 
link or a "human code transmitter". The operator does not 
PROCESS the coded information but has simply to render 
TRULY both the SYMBOLIC CONTENT and the ORDER in which 
the symbols appear. 
ERRORS were defined as any difference in each posi- 
tion of the code. Figures were however obtained 
also for ERRORLESS TRANSMISSION, i.e. for entries 
(whole codes) with ho errors, compared with those 
with AT LEAST ONE error. 


Independent variables were code forms (letter, aie 
git, combined letter and digit), aural or visual 
presentation, information content in terms of in- 
formation theory, rate of presentation and grou- 
ping of items inside the code. 

Dependent, studied variables were the number of 
errors (loss of information) and the percentage of 
errorless transmission (100 minus the percent of 
codes with at least one error). 


Special features of the experiment were e.g. the 
deletion of the letter M from auditory experiments 
to avoid its confusion with N; the adjustment of 
the number of digits in relation to the number of 
alpha - letters in order to be able to compare 
codes with the same information content but diffe- 
rent alpha content; avoidance of codes which con- 
tain aids to the memory(such as for certain tele- 
phone codes), and advance information to the subh- 
jects of the experiment about the structure of the 
codes to be presented (quantity of digits or Llet- 
ters), and adequately long writing fields on the 
forms - which the subjects knew should be complete- 
ly filled out. 


The results show that errors began to occur for 
codes with an information content of more than 

20 bits (about four letters or five digits). The 
experimentally determined frequency of errorless 
transmission for the entire code was higher than 
the calculated based on the assumption for probabi- 
lity of incorrect digits, derived from the number 
of errors in reproducing 7-digit codes. This sug- 
gests that errors are not uniformly distributed 
over the codes, but have rather a tendency to 
cluster. 
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Typical figures for errors were e.g. 2 errors for 

8 symbols in alpha codes, or equivalently 10 symbols 
in digit (numeric) codes. The figures were obtained 
by averaging over a heterogeneous group of sub- 
jects. 

For e.g. an 8- digit code the calculated probabili- 
ty of errorless reproduction is about 35 % against 
the experimentally found 65 % (approximate); for 

a letter code the calculated probability of correct 
reproduction is about 70 % against more than 80 % 
experimentally found when considering a letter-code 
length of the same information content (10 exp 8 
possibilities) as the 8-digit code. 


PREDICTING CLERICAL ERROR 


A study aimed at predicting clerical error in EDP environ- 
ment, reports some findings from analysis of input error 
in a highly automated bank central office. 


Since error was an infrequent occurrence with re- 
gard to the bulk of behavior, a laboratory apyroach 
was economically prohibitive. The solution was to 
locate a large amount of historical data on errors 
made in encoding dollar amounts on money checks 

for further MICR (Magnetic Ink Character Recogni- 
tion) processing. 


The study gave some side-results, like indicating 
that errors per 1,000 items listed (checks) varied 
during a week between 1.002 and 1.203, the peak 

rate being on Tuesday, typically the day of the 

week with highest error rates. Furthermore the study 
confirmed the negative relation between error-rate 
and speed of listing, the fastest operators making 
the least errors. Finally, a classification of the 
kinds of listing errors showed that 

digit substitution errors accounted for 62.4 % 
omission errors for 20 
insertion errors for 6 
transposition for 1 
double substitution 2 
double omission 2 
double insertion 1 
miscellaneous 5. 
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Besides the results above, the study actually aimed 
at the development of predictive routines indica-- 
ting the item listed in error and the place within 
the item, such as the last digit, or the two first 
digits,etc. An explanation is now required for the 
often used term "listing", 


The setting of the study was a central location 
where checks from outlying branches and banks are 
brought at the end of the day's work to be listed 
and then sorted to the maker's branch or bank. 
The equipment used was check proof machines of 
common make. The operator detects an error by no-- 
ticing a discrepancy between the incoming tape 
total and her current master tape total. The pre- 
dicting routine had a goal of using a heuristic 
approach to create a binary decision tree that by 
processing of the correct list would simulate hu-- 
man error and predict errors,to be used in the 
investigations in search of the actual errors. 
Out of 4,155 new errors, 46 % were correctly pre- 
dicted by the developed set of routines. These 

46 % should be compared with the 10 % corresnon- 
ding to what should be expected from a straight 
chance prediction, or 20 #4 when considering cer- 
tain obvious higher-probability errors such as that 
3 is more often changed to an 8 than to al. 


Note: as an implication to the initially mentioned side- 
results! figures, it may be suggested that the error 
rates (errors per 1,000 items listed), combined with the 
listing volumes per day (varying during the week between 
232,000 and 385,000 for 54 operators), would imply - 
prior to correction procedures - the input of 240 to 420 
errors per day into the system at that particular instal- 
lation. 


COPYING ALPHA AND NUM@RIC CODES BY HAND: 
AN EXPERIMENTAL STUDY 


The identification of individuals or "items" in 

an information system, as well as other requirements 
for identification of e.g. transactions, imnlies 

use of CODES. These codes are often groupings of 
alphanumeric characters and they are likely to 
being copied into forms,etc. by an increasing num- 
ber of people including the untrained general 
public. 
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Against this background a study was made for com- 
paring error rates and speec. when codes are presen- 
ted to the "copier" in different ways. In varying 
degrees the following factors were investigated: 

- distance between source code and copy 

- length of code 

- configurative grouping of digits within a code 

- all alpha or all numeric codes. 


The percent of wrong codes resulting from errors 
in simple copying was in this way shown to vary 
between 1.11 and 3.15 for codes of mixed lengths 
of 3,6;9, and 12 digits under various conditions 
of the other factors above, 


When sorted in groups of same length, the codes 
resulted in error rates varying from 03.33 % (for 
length of 3 digits) to 4.19 % of wrong codes (for 
length of 12 digits). The copying errors were also 
analyzed by CRITERIA OF INCORRECTNESS and classi- 
fied in classes below, under varying combinations 
of the earlier mentioned factors: 

- Transposition 4.3 - 24.1 % 

- Substitution 33.1 - 86.9 % 

- Addition (+1) 1.9 - 7.2 
- Omission (-1) 4.9 - 53.9 


G.B. Davis (editor) - (1968) 
ON THE ACCURACY OF OCR (OPTICAL CHARACTER RECOGNITION) 
IN THE CONTEXT OF AUDITING OF EDP SYSTEMS 


In the context of discussing hardware features for control 
over equipment malfunctions, the author frames the OCR 
accuracy problem in terms of two rates: the REJECT rate 
and the ERROR rate. 


The reject rate is the percentage of documents rejec 
ted because the equipment is unable to recognize the 
character. At the state of technological develop- 
ment around years 1967-1968 typical reject rates 
were in the range 2 - 20 %. 

The error rate is the percentage of documents which 
were read but which contained one of more characters 
incorrectly identified. The typical rates ranged 
from less than 1 % of documents up to 2 %, 
The reject rate is said to be significant in terms of han- 
dling time and reprocessing. The significance of error 
rate is dependent upon the application: 1 % error rate may 
be quite acceptable for one application but totally 
unacceptable for another. 


IMPROVEMENTS IN DATA-ENTRY: GENER. L CONSIDERATIONS 
AND KEY-TO-TAPE DATA ENTxY SYSTEMS 


In a report on developments of data-entry devices, the 
above issue of EDP Analyzer refers indirectly to expe- 
rience on input error-rates. For example, the input data 
error rate is said to have been very good - less than 

4 % - for keypunching of cards at a specific installation. 
Conceivably it refers to rate after punch verification 
and from what follows it apparently refers to number o0* 
keystrokes rather than number of entries - in some sense. 


In discussing the importance of easy correction capabili- 
ties at entry devices, a reference is made to a report 

by R.F.Carey who, in the June 1970 issue of Datamation, 
states that 85-90 %@ of keying errors were immediately 
sensed by the operators of specific entry devices which 
allowed keying of entire records into an intermediate 
storage device or buffer. 


In discussing ACCURACY requirements, tolerable error rates 
are said to vary anywhere from an average of one error 

per 20 keystrokes up to and beyond an average of one error 
in 10,000 keystrokes. 


Accuracy requirements appear to be considered high and 
demanding if they are set at ahout one error in 10 ,9c0 

or more keystrokes in keypunching. When this error rate is 
attained in typewriting for OCR input, it appears that 
proofreading detects few of the errors. Accuracy is named 
as being especially important e.g. in dealing with legal 
documents. 


The considered issue of EDP Analyzer is also interesting 
for its attempts to clear up the error issue at a more 
conceptual level. In discussing data-entry it separates 
the subject of verification from the subject of validation. 


VERIFICATION is defined as the process of assuring (throu- 
gh detection and correction) that the data recorded on a 
source document has been TRANSCRIBED ACCURATELY to machine 
language, 


VALIDATION is defined as the process of assuring that the 
SOURCE DATA WAS CORRECT, by such means as logical checks, 
control totals, check digit checking etc.,i.e. more gene- 
rally by testing input data fields against some DATA 
DEFINITION for those fields. 


Also at the conceptual level it is interesting to note 
that validation methods are considered as one of the types 
of verification, implying some kind of conceptual overlap- 
ping of the used words; it is stated for example that some 
validation checks also perform verification, “bub it is 
incorrect to assume that all verification can be elimina- 
ted by validation checks" (EDP Analyzer,0ct.1971,n.8) 


AZO 


EDP Analyzer concentrates further 1n the subject of 
verification, while validation is to be discussed in the 
October 1971 -issue. Other mentioned types of verifica- 
tion, besides validation methods, are KEY VERIFICATICN 
and SIGHT VERIFICATION. In discussing criteria of choice 
between these two methods, reference is made to a study 
by R.C.Turnblade which reportedly classifies input data 
in three types in terms of their MTANINGFULNESS TO THE 
READER: 


LANGUAGE TEXT such as name and address data, which 
is familiar and MEANINGFUL TO MOST PEOPLE. 


BUSINESS JARGON such as part names, part numbers, 

business form entries which take on moaning: to the 
extent that a person becomes experienced in using 

such types of data. 


"NONSENSE" DATA, such as quantities and code num- 
bers, which are essentially not meaningful to the 
casual reader in the sense that he cannot tell 
whether it is RIGHT or WRONG just by looking at 
the number. 


As referred by EDP Analyzer, in discussing the criteria 

of choice of method of verification, Turnblade uses 

1. Types of meaningfulness (listed above) 

2. Allocation of functions in creating the data - versus 
entering it: also interpretable in terms of frequency 
of repetition of task/familiarity of the operator with 
the particular job. 

3. Ease of correction 

4, Accuracy requirements. 


The criterium of type of meaningfulness interacts strongly 
with that of allocation of function in that Turnblade con- 
ceivably considers that meaningfulness is a function of 
both the type of data (in terms of meaningfulness) and of 
whether the person entering the data is the same who cre- 
ated the source document. 


In summarizing part of the above discussion, in what con- 
cerns sight versus key verification, EDP Analyzer of Octo- 
ber 1971 states that sight verification is useful for data 
that can be verified in terms of words or phrases while 
key verification is needed where the data must be comnared 
on a character-by-character basis. 


Eventually, especially in the context of key-to-tape sys- 
tems, EDP Analyzer introduces a new terminology variant 
by defining UNCORRECTABLE ERRORS as those source data- 
errors which are caught by validation checks. When such 
checks, (e.g. to see that a value falls within a specified 
range, or is a member of a specific set of values ), fails 
i.e. detects an error) during data entry, it means that 
the source data is WRONG and it should be considered TN- 
CORRECTABLE (possibly meaning "by the operator") at the 
entry stage. Attempts to correct such errors would heavi- 
ly affect the effectiveness of the entry process; the of- 
fending field should be rather marked, bypassed and logged 
for later human analysis. UNCORRECTABLE errors must there- 
fore not be confused with RESIDUAL when these refer to 
undetected at entry and introduced into the processing. 
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EDP ANALYZER (OCTOBER 1971) 
IMPROVEMENTS IN DATA ENTRY, ESPECIALLY ON KEY-TO-DISC 
AND ON VALIDATION 


One case is reported where 5 % entry error rate before 
verification (not more closely specified) was obtained 
with direct data entry system with CRT (Cathode Ray Tube) 
terminals, Switch over to using a particular key-to-disc 
system which also performed extensive validity checking 
resulted in the error rate going down to about 4 hy 


Experience from another installation is reported showing 
that a 2 % error rate when using keypunch entry, dropped 
to below 1 % with the use of a key-te-dise system, 


in the context of evaluating especially key-to-disc systems 
it is noted that some validation checks dan also act as 

a verification check: check digit is an example. Control 
totals and inter+field relationships are worse examples 
because of the possibility of errors comnesating each 
other and because of "legal wrongness". 


In the context of VALIDATION FEATURES the following types 
of VALIDATION CHECKS are said to be possible with data- 
entry systems employing mini-computers: 

1. Character-set check - 

» Value-set check 

Range check 

Check digit check 

Control tutal balancing 

Record count 

Seguence check (if transactions have sequence numbers 
and have been sorted into that sequence) 

Inter-field relationship checks 

Field length check. 
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The author goes on to reporting of some findings which 
reduce SOURCE DATA ERRORS, since such reduction "... of 
course will reduce the number of cases where the valida- 
tion checks will fail" (p.9). Apparently this refers to 
the familiar concept of prevention. Two methods for redu- 
cing source data errors are develoned: 

1. Field and code design 

2. Design and use of source documents. 


Besides of reporting extensive experience of the economy 
and the effectiveness of the entry process, the author 
refers to a report by R.C. Turnblade containing summaries 
of "nominal" error rates obtained from numerous sources, 
and restates the findings in the following table on 
NOMINAL ERROR RATES PER 10,000 KEY STROKES, where 


MANAGEMENT EMPHASIZES : 


Accuracy Speed 
Language text 2 100 
Business jargon 5. 100 
Nonsense data 100 200 


Such data seem to be in line with other reported by 
Johanningsmeier, who is cited as reporting production error 
rates of 1 to 2 per 10,000 for text and jargon. 


Hoe Lu 


W.H. Emmons et _al. (1970) 


A COMPARISON OF THREE NUMERIC KEYBOARDS 


An experiment is reported having the purpose of comparing 
the performance of inexperienced operators at differen; 
types of 10-keyboards with which they were unfamiliar. 


Initially the experiment consisted in having the opera- 
tors keying 1,000 sets of randomized 5-digit numbers nn 
each of three keyboards. The numbers to be keyed were 
presented to the subjects via a CRT display connected 

to a computer. The computer was programmed to calculate 
the number of UNDETECTED ERRORS, i.e. errors not correc- 
ted by the subjects themselves: the subjects had the pos- 
sibility of repeating the digit entry if they realized 
that they had made an error. 


The percentage of keystrokes with undetected errors varied 
between 0.37 and 0.39 % while the keying speed was in the 
range of 1.29 to 1.33 keystrokes per second. After dis- 
counting for INVALID CHANACTER ERROKS, i.e. errors caused 
by the keyboard hardware, the percentage of errors Gack, 
errors/EFFECTIVE KEYSTROKE, not counting keystrokes 
corrected by the operator) varied between 0.32 and 0.37. 


Since the performance of the operators improved with time 
during the successive sessions of the experiment, the 
last sessions were dedicated to gather statistics on the 
performance of four keyboards of the same type (but with 
slight functional differences) as one of the previously 
used. The keying rate proved to vary between 1.31 and 
1.49 (average number of effective keystrokes/second) 
while the % of errors (undetected errors per effective 
keystrokes) varied between 0.17 and 0,38, 


NUMERICAL ERROR CHECKING 


The author states the purpose of gathering some statistics 
on error-checking. The emphasis of some studies like 
e.g. Conrad & Hull's (1967) places emphasis on speed and 
checking is discouraged. 


The study was performed trying to answer two basic ques-— 

tions: 

1. What is the effect of grouping digits on the speed 
and accuracy of error-checking ? 

2. How does the frequency of errors to be detected ~- 
affect the speed and accuracy of error checking ? 


Only numerical material was used. Both experienced and 
"naive" i.e. unexperienced subjects were asked to compa-— 
re numbers to be checked, which were printed on pairs of 


separate pages. The task was to mark those digits which 
were different. 


Three different error probabilities were used: 0.1, 0.0i. 
and 0,001 - where error probability is defined as the 
proportion of digits on one of the two sets of pages, 
that were different from the digits on the other set 

in the corresponding comparison-place. For example, for 
error probability 0.01 approximately one digit in a 100 
was changed on one sheet of each pair. The following 
results were obtained: 


Naive (N) Error Percent digits Percent re- 
Experienced (£) Probability not detected siduai er- 
rors 

N Oz A O.4 

E Ol 2 On2 

N 0.01 13 0.13 

E 0.01 13 0.13 

N 0.001 24 0.024 

E 0.001 Li? 0.017 


PRODUCTIVITY AND ERRORS IN TWO KEYING TASKS: 
A FIELD STUDY 


The investigation aimed at measuring productivity and 
error rates for a billion responses by more than a thou- 
sand operators of card punches and bank proof machines 
in twenty different installations. The authors studied 
the influence of time on the job (experience) and of in-- 
dividual differences among operators. 


The percentage of errors caught in an independent verify- 
ing procedure, for card punching were in the range 0.02 
to 0.06. No data is reported on errors which the opera- 
tor himself detected and corrected and it is not clear 
whether the verifying procedure was a punch verification. 
This is however probably the case in face of the nature 
of the studied environment; it also clarifies why no cata 
were available on the residual, undetected errors after 
verification. 


For bank proof machines, the figures are given in terme 

of percent of transactions (checks), and the errors ave- 
raged 0.03 % errors per check, not including errors caught 
by the operator himself in checking the total of his ma- 
chine with the supplied control total. 


Special features of the investigation were e.g. that 

no errors in the cents or dime positions were counted. 

The same applied for those errors which were conceivably 
caused by poorly written numerals or by certain PROCEDURAL 
MISTAKES. 


E.T. Klemmer (Personal Communication) (1964) 


(referenced in Smith, 1966) eT gg ee Te 
HUMAN RELIABILITY: SOME OBSERVATIONS 


W.A. Smith (1966, p.14) reports that E.T. Klemmer in 
1964 indicated that the average telephone user dials cne 
percent of digits incorrectly. Two thirds of these errors 
are detected by the user himself in the course of dialing 
while the rest (about O63 %) is caught by the system 
(e.g. as a "non-existent" number) or results in WRONG 
numbers. Of those errors not detected by the customer, 
two thirds can be allocated to the dialing of wrong 
digits (usually one unit off) and the other third to 
having the wrong number in mind or failing to dial 

enough digits. 


GROUPING OF PRINTED DIGITS FOR MANUAL TELEPHONE ENTRY 


One of the common problem areas underlying all manual 
entry of numbers (here defined as a linear array of 
digits presented simultaneously) is how to group them 
visually for optimum performance by the average user, 
says the author. 


He reports six experiments whose purpose was to see if 

the major previous findings favoring groupings by 3's 

and /#'s would hold for numbers of different lengths, users 
of different skills, and various orders of presentation. 


The different skills of subjects were: technical or 
professional job classifications, clerical-secreta- 
rial, and shop workers. 


It is not clear to us whether errors were defined inclu- 
ding or excluding those self-detected by the subjects. In 
some of the experiments, errors were immediately signaled 
by the experimenter to the subjects, allowing for correc- 
tion, while this appears not to be the case in other of 
the experiments. The percent figures seem to stand for 
percent of cards with one or more errors per grouping or 
per subject. The study includes some figures about rela- 
tionships between time per entry and error rates. 


Error rates in the six experiments showed to be all less 
than 1 % when averaged over groupings and subjects. 

None of the experiments showed a statistically reliable 
difference in errors as a function of grouping nor there 
was any consistency over experiments. Large individual 
differences between subjects were however found with 
respect to rates of committed errors, in the course of 
the experiments which were all concerned with the overall 
process of looking at printed numbers and entering them 
on a push-button telephone. 
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HUMAN FACTORS PROBLEMS IN THE USE OF PUSHBUTTON 
TELEPHONES FOR DATA ENTRY 


In an attempt to uncover some of the basic human factor 
problem areas, Kramer reports some results of the anaiy- 
sis of user performance (in terms of speed and ACCURACY) 
in using pushbutton devices for data entry. First come 
three cases of analysis of FIELD data which describe 
observations of REAL use of pushbuttontdephones for data 
entry. 


1. IN A PRODUCTION REPORTING FIELD-TRIAL. 


Worker ERRORS were classified as : 
- PROCEDURAL - e.g. sending data before answer- 
back tones had ended. 
- HAND KEYING - eig. adjacent digit substitution 
and digit omissions 
- OMISSIONS - i.e. failure to make a report 


An analysis of entries of up to 19 digits (including pre- 
punched information) made by 44 workers revealed an 
OMISSION RATE of 8 % where the rate includes corrected 
entries (by the workers) and the percent is given in terms 
of entries, The PROCEDURAL ERROR RATE was at about 4 % 
and the HAND KEYING RATE at about 3 %. The figures should 
be considered with care since entering data before answer- 
back tones had ended had an exceptional effect on one 

of the several (10) studied locations. 


About half of the procedural and hand-keying errors were 
corrected decreasing the total error rate from about 

15 % to 11 %. Tt appears that the corrections were those 
motivated by immediate self-detection by the subjects, 


or thanks to crror-answerback tones at the entry device. 
2. ACCESSORY ORDERING - FIELD TRIAL 


Omission errors could not be detected since NO INDEPENDENT 
SOURCE DOCUMENT was available to compare what the users 
ordered with what should have been ordered. This excludes 
from the error count also the ordering of completcly wrong 
items or wrong quantities. 


For order-messages of up to about 30 digits (including 
prepunched information) ,the PROCEDURAL error rate (e.g. 
failure to enter either or both of the prepunched card 
fields - for instance for station identification) was 

about 23 % giving a residual after corrections of about 

9 %. The HAND KEYING error rate was 5 % leading to an 
uncorrected, i.e. residual rate of 0.3 % mainly due to 

the use of self-checking item-code numbers which made pos- 
sible the returning of error-answerback tones to the user. 
The TOTAL ERROR rate went thus from 28 % to a residual 9 %. 
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3. AN OPERATIONAL CREDIT-AUTHORIZATION SYSTEM IN A 
DEPARTMENT STORE 


Upon receipt of an inquiry message of up to 16 digits, 

a computer reviewed credit information about the indica- 
ted customer account and then commanded an audio response 
unit to compose the appropriate reply. A sample of the 
entries at one of seven possible input channels was ana- 
lized and the voice response generated by the computer 
indicated that about 20 % were calls containing at least 
one user error. Because of the circumstances neither TRUE 
nor residual error rates could be determined in relation 
to the total set of users and input devices. 


Upon analysis of the results from the three field studies 

Kramer identifies three basic human factors areas: 

1. User instructions and training, which were quite insa- 
tisfactory in the studied situations. 

2. Data entry formats and procedures. 

3. Feedback and knowledge of results in form of e.g. 
answerback tones. 


In addition, Kramer reports some LABORATORY experiments 
on aspects of user performance in transmitting combined 
alphabetic and numeric information using a keyboard con- 
taining only LO or 12 buttons. Subjects were assigned to 
three groups using three different entry methods, Each 
subject entered about six orders for ten items each; the 
details of the study suggest that each subject group en- 
tered about 35,000 characters. 


ERRORS (both corrected and uncorrected) were classified as 
- PROCEDURAL 
- TIME GATE OR TIME DELAY 
- ALPHABETIC 
- NUMERIC 
The sum of uncorrected and corrected errors was related to 
the term "ORIGINAL" error rate while uncorrected errors 
were referred to by the term "RESIDUAL" rate. 


The largest contributors to procedural errors were mode- 
shifts numeric/alphabetic showing a residual rate of one 
out of every 50 mode-shifts. The largest contributors to 
timing errors was keying letters too slowly: the residual 
rate for timing errors was one error for every 89 LETTERS, 
The maximum residual rate for alphabetic errors was 1:61 
letters, and for numeric 1:384 numeric characters. 


Kramer terminates his paper emphasizing the importance of 


motivational and procedural aspects of entry, for total 
system performance. 
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IN THE CONTEXT OF AN INTRODUCTION ON INPUT TO 
COMPUTERS BY MEANS OF PUNCHED MEDIA 


The author mentions that investigations have shown that 
about 0.3 % of punched characters are in error. Punch 
verification done after card punching usually reduces the 
above figure to 0.03 %. If punch errors in the punch 
verification process were committed at random, the ex- 
pected rate after verification would be much lower. 

The difference may be attributed to that certain kinds 

of substitutions of digits or misreadings of handwritten 
digits (or letters) have a higher probability of occurren- 
ce than others, says the author. 


Langefors goes on observing that punch verification can- 
not catch errors made by the people who create the source 
document, in writing down the original figures. If it is 
assumed that source errors are made with the same rate as 
above, 0.3 %, they cannot be detected by e.g. control 
totals and punch verification will only detect 27 out of 
60 erroneous characters in every 10,000 characters, i.e. 
less than 50 % of such errors. 


Langefors gives an example where a data entry device 
working on punched media with 0.3 % error rate, would 
inject at Least 18 errors per hour of operation, into 
the system, if no other checks were performed. 


Since such other checks are not performed in many 
administrative applications of EDP, one can ask 
how it has been possibhke to obtain meaningful re- 
sults in such applications. The explanation is that 
administrative EDP is made on the basis of a LARGE 
NUMBER OF SEPARATE, SMALL TRANSACTIONS. An error 
rate of some tenths percent of the transactions 

is not a large burden in an administrative applica- 
tion where even OTHER ERROR SOURCES exist. 


On the other hand, the effect of occasional errors 
in a scientific EDP application may be of decisive 
importance for the results. Fortunately, in large 
mathematical complicated computations it is possible 
to design mathematical checks that detect most in- 
put data errors. It appears that THE VERY LARGE 
NUMBER OF DATA which are used in the computation 

is what also makes possible the mathematical checks, 


In addition to other error detection methods, Langefors 
also mentions the well known check digits. In another 
work (1968b, p370) he refers to an investigation where the 
percent of wrong characters (in the case:digits) was pro- 
ved to be 0.1 % in punching. The possibility of using a 
check digit detected about 99 % of the errors and conse- 
quently reduced the undetected punch errors by a rate of 
1/100 compared with the verification reduction of 1/10 
mentioned above. Furthermore the author notes that check 
digits, (whenever practical) also permit detection of some 
errors in writing the source documents, resulting in a 
further improvement of the overall detection rate. 
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J. Martin (1969) 


TELECOMMUNICATIONS AND THE COMPUTER 


Computer data may be transmitted through land-based and 
through high-frequency radio communication links. Such 
links introduce their own errors in the data, through 
distortion or noise. Martin offers some statistics which 
has been gathered in this respect. 


Typical, most probable error rates are stated: 


1. On 50-baud telex lines - one bit error per 
100,000 or one bit error per 50,000 transmitted 
bits corresponding to between one and eight 
character errors in 100,000 transmitted charac- 
ters. In terms of time this corresponds to be- 
tween one error in half an hour - and one error 
in about four hours, 


2. On 200-baud telegraph lines - somewhat better 
results than above; about one bit in error per 
100,000 transmitted, 


3. On 600 to 2,000 bits/second voice grade lines, 
further improved error rates, varying between 
1/500,000 and 1/100,000. 


4, On high-frequency radio circuits, which should 
be avoided in the transmission of computer data, 
a typical error rate is one character per 1,000 
transmitted, before correction. 


After usual detection and correction procedures 
(by code or by retransmission) many systems might 
improve the level of undetected errors from 
1/100,000 to 1/10,000,000 bits. One available co- 
ding scheme for reduction of undetected error rate 
will reduce it to 1/ 1 x 10 exp 14. 


For code-detected retransmission methods in high 
frequency radio circuits the undetected error rate 
may at certain bad periods of time rise to 1/16,000 
characters or even 1/160 while the effective speed 
of the link would drop to perhaps 90 % respectively 
50 % of the nominal speed. 


Martin mentions that other components of a computer system 
(other than telecommunication links) such as tape or file 
channels have a much lower error rate than the rates of 
undetected errors of telecommunication links in conventio- 
nal use today. 
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EVALUATION OF INPUT DEVICES FOR A DATA SETTING TASK 


A study evaluating a set of four types of numeric manual 
entry devices used the criteria of ERROR RATE, ENTRY T "ME, 
and OPERATOR PREFERENCES. 


Non experienced operators keyed 10-digits numeric data 
words in 10-key keyboards and attained an average of 
0.6 % of entries containing one or more errors. 


The subjects' own handwritten data word served as the 
criterion against which the manual entry was checked for 
ACCURACY, Therefore poorly written numerals could barely 
influehce the error rate, 


REDUCING TELEPHONE NETWORK ERRORS 


The technical feasibility of a data communication 
system depends upon its FREEDOM FROM DATA PRRORS, 
probability of detecting errors that do occur,and 
its efficiency in overcoming the effects of errors. 


Errors are introduced into data systems by both 
HUMANS and HARDWARE. Those errors which are attri- 
butable to hardware may result from either EQUIP- 
MENT MALFUNCTIONS or RANDOM TRANSMISSION INACCURA- 
CIES. 


This study limits itself on errors due to TRANSMIS- 
SION INACCURACIES in normal voice band data trans- 
mission over the USA switched telephone network. 
Furthermore, the report deals only with statistics 
on error-free reception of long blocks (message 
formats) of length from 10,000 up to 300,000 bits 
of data. 


The paper mentions previous available statistics of an 
average error rate of about 3/100 ,000 bits. However, since 
errors happen to be clustered, i.e. not uniformly scatte- 
red throughout the data, there are frequent long intervals 
of time which are completely error free. This explains 
why the error free percent of long messages is much higher 
than would be theoretically expected in the case of uni- 
form distribution of errors. Figures are given of e.g. 

18 % for messages of 2 million bits 

65 % for lengths of 200,000 bits 

74 % for lengths of 100,000 bits 
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In summary, the report mentions that the probability of 
error-free reception is reasonably large, i.e. in the 
range 0.6 to 1.0 and that those messages which do 
have errors tend to contain most such errors. A study of 
the effect that time of the day has on errors shows that 
calls placed at night contained twice the percent of 
error-free messages as those calls made during daytime. 


The report gives some detailed calculations which illus- 
trate the kind of error-thinking in the context of data 
transmission: 


The above error rates refer to"TRUE" ERRORS as verified 
in experimental situations. In practice one works with 
additional concepts such as RATES OF UNDETECTED TRRORS 
which refer to messages that are free from PARITY-CHECK 
FAILURES; i.,cs messages with errors undetected by parity 
check procedures. This,by the way, introduces a new spe- 
cific meaning of UNDETECTED in quality-terminology. 


It is interesting to note in this context that due to the 
characteristic clustering of errors both inside a charac- 
ter and inside the whole message, long messages accepted 
without parity failures are likely to show lower rates of 
hidden (undetected) errors than the rates obtained in 
retransmitting individual cnaracters or short blocks 
until they are accepted free from parity failures. 


In a typical calculation, for 200,000-bits messages con- 
sisting of 25,000 8-bit characters: 


The probability that the message is TRULY error-free 0.65 


The probability of undetected errors existing in 
the message without parity failure 0.02 


Consequently the probability of a message 
APPEARING to be error-free 0.67 


Since the incidence of undetected errors in messages free 
from parity errors is known to be quite low, the author 
mentions that such statistic may be difficult to obtain 
since it is difficult to discriminate them from what are 
designated as DATA HANDLING ERRORS. 


Illustrating further the use of the above figures in 

a typical calculation, the author mentions that if the 
-above messages of 200,000 bits are repeated until received 
without parity failures, then each call must be made on 
the average 1/0.65 or about 1.5 times. Once all messages 
are received without parity failures, one will still have 
a residual probability of 0.02 of each message containing 
undetected errors. 


The OVERALL CHARACTER ERROR RATE IN ACCEPTED DATA then 
would be 0.02/25,000 = 8 x 10 exp -7 which is two 
orders of magnitude smaller than the achieved by retrans- 
mitting individual characters until received without 
parity failures. This advantage is obtained at the cost of 
longer everall transmission time, 


wer 
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J. Orlicky (1969) 


IN THE CONTEXT OF INPUT DATA INTEGRITY FOR 
SUCCESSFUL OPERATION OF EDP SYSTEMS 


Orlicky, without giving some specific definition of 
errors, states that typical error rates run between 

14% (very good) and 3 % of collected transactions. Thus 
a job shop with 1,000 employees, which may report, say, 
7,000 labor, production, and material movement transac- 
tions per day, can be expected to generate 100 or 200 
errors every day. 


S. Owsowitz & A. Sweetland (1965) 


FACTORS AFFECTING CODING ERRORS 


This is a research memorandum related to a project con- 
cerned with USA's Air Force so-called maintenance mana- 
gement. It reports the results of a number of experiments 
which, the authors say, explore the possibility of 
"designing" human factors elements into EDP systems. 
Human subjects coded a variety of data in a number of 
ways with the purpose of determining which methods resul- 
ted in the fewest errors. 


Air Force maintenance personnel were used as subjects of 
the experiments, in which their coding routine resembled 
their method of recording real-world maintenance data. 
Their coded information was keypunched and the resulting 
decks were analyzed to determine what factors led to the 
highest and lowest error rates. 


Coding was in this context defined as the translation 

of a judgement into a form suitable for machine vrocessing 
and the study limited itself to three-digits (alpha and/ 
or numeric) codes. INDEPENDENT VARIABLES in the various 
series of experiments were e.g. alpha content (we. the 
proportion of code digits that were alphabetic), positio- 
ning of the alpha-numeric content, knowledge on the part 
of subjects and keypunchers about the allowable ("legal") 
content alternatives, use of mnemonic codes or letter- 
pattern familiar codes. 


In experiments as these it is possible to speak of TRUE 
(rather than DETECTED) error rates after keypunch and 
verification, varying between 1.2 % and 16.4 % wrong 
entries as proportion of all code entries. Error analysis 
in practical applications usually refers to DETECTED 

(and therefore IDENTIFIABLE) errors with rates typically 
in the range 1 %~ - 5 #4. Such detections usually refer to 
detections through programmed validity checks. Since such 
checks are based on the "legitimacy" of certain digit com- 
binations, in terms of communication theory this indica- 
tes that to machine-detected error rates may in fact cor- 
respond 2-3 times higher TRUE error rates, the difference 
being due to the UNDETECTABLE errors. 
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J.A. Perlman (1963) 


IN THE CONTEXT OF DISCUSSING DATA COLLECTION FOR 
BUSINESS INFORMATION PROCESSING 


In a report on data collection devices available on the 
market, Perlman points out that experience at one ins: al- 
lation using equipment with error-detection capability 

of lesser sofistication, indicates a RETRANSMISSION RATE 
(error detected while the operator is still at the remote 
station) of around 0.5 % and an UNDETECTED rate (that in 
this context refers to detection by the system after the 
data collection step) of less than 0.1 4%. 


Another installation using data collection devices of a 
higher sofistication is renorted as having operated with 
an undetected error rate of less than 1/100,000 charac- 
ters. It is not clear whether the above figures are in 
terms of characters too, or rather in terms of entries. 


MAN-COMPUTER CCMMUNICATION TECHNIQUES: 
TWO EXPERIMENTS 


This study recognizes that present computer technology 
no longer requires man to communicate indirectly with the 
computer through the medium of punched cards or tape. 

The two related experiments evaluated alternative man- 
computer communication teclniques relevant even for on- 
line communications. 


Five primary variables affecting man-computer interaction 
were isolated and manipulated to various degrees: 
- word form (full word or abbreviations) 
- syntax 
- format (fixed or variable length, tagged field) 
- equipment (written, voice, teletype transmission) 
- procedures (allocation of work between the 
interpreter-coder and the communicator-operator) 


Subject performance was analyzed in terms of time and 
of errors. ERROkKS WERE CLASSIFIED in: 
- spelling: any misspelled word 
- omission: failure to enter a required item of 
information 
- content: wrong information, e.g. incorrect iden- 
tification or coding of event 
- sequence: information items in the message not 
in proper sequence. 


One experiment involved 20 subjects using real system 
messages and being trained interpreters of aerial photo- 
graphs. They composed target reports from simulated pic- 
tures, and then either teletyped them immediately while 
composing (direct entry), or handwrote or voice tape-recor 
ded them for subsequent teletyping either by themselves 
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or by another "communicator". The messages had a maximum 
of 224 characters if in fixed field format but otherwise 
their length is not stated. The subjects were all trained 
teletypists above a minimum speed of 35 w.p.m. 


Errors are presented in terms of average number of errors 
per image-frame to be -ceported as military intelligen:e 
information. The average of UNDETECTED errors per image 
in the experiment varied between approximately 1.4 and 
2.4. Detected errors were defined as those detected 

(and corrected) by the person entering the report in the 
computer-readable mode. 


Some degree of leniency was used in scoring errors, 
Although the transcribed reports would no doubt 
have been found to contain more errors than reflec- 
ted in the present analysis if subjected to a com- 
puter input edit program, it was felt that several 
steps would be taken in an operational system (such 
as increased training time) which would overcome a 
major portion of the ERROR PROBLEM. IN PARTICULAR, 
CONTENT AND OMISSION ERRORS WERE SCORED LENIENTLY 
with only MISIDENTIFICATION OK OMISSION OF TARGET 
items or other critical information being scored as 
errors, 


The authors present no error figures for the second of the 
two experiments since no meaningful differences were found 
between the effects of two word form variations and three 

format variations. 


W.A. Smith Jr. (1966) 


ACCURACY OF AUTOMATED DATA COLLECTION IN 
PRODUCTION INFORMATION SYSTEMS 


The figures reported by Smith refer to a more complex 
situation which includes many types of "errors" which 
are outside the frame of reference- in some sense - of 
most other investigations. 


Smith's findings indicate that the percent of wrong 
entries varies in the range 6.8 % - 26.1 %. 

AFTER APPLYING THE OPERATOR'S OWN, AND THE SYSTEM'S DE- 
TECTION AND CORRECTION PROCEDURES that were available, 
the percent of RESIDUAL EKRONEOUS ENTRIES varied in the 
range 3.4 % - 5.6 4%. 


The definition of errors in this investigation included 
- omitted entries (failure to record an event) 
- misidentification 
- miscount 
- wrong sequence (of partial entries in a complex 
message) 
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The field study to which the above figures apply, dis- 
played the following independent variables of environmen- 
tal parameters: 

- individual recorder differences (combinations of 
worker and device, accuracy of entries of the same 
worker as function of cae | 

- differences between work shifts (implying different 
workers, supervisors and recording procedures) 

- differences between work sites (continuous assembly 
line versus job shop with variable operations and 
routing, each having messages of different compli- 
cation and length) 

- use of pre-assigned media (e.g. pre-punched cards 
and worker's identification badges to be inserted 
in a shop terminal) versus manual entry, 


The field study was complemented with an experiment with 
the purpose of studying the effect of different message 
lengths and of time pressure on making entries. 


The dependent variables studied were especially the total 
number of errors (entries) and the RESIDUAL number of 
errors, i.e. after detection and correction were applied. 
The results of the experiment were also used to determine 
the kinds of manipulation recording faults in copying 
digits. It anpeared that about 60 % of such faults were 
caused by single digit substitution, another 20 % by 
single digit omission while the rest consisted of douhle 
substitutions, double omissions, insertions, transposi- 
tions and miscellaneous. 


The conclusions of the overall study emphasize the heavy 
contribution of so-called CONTENT and EVENT DFSCRIPTION 
MISTAKES to the residual rate, especially OMITTED entries. 
They also emphasize the need to reduce message length and 
complexity. 


ON THE HUMAN SIDE OF DATA INPUT - OCR INPUTS 


The author frames the OCR ACCURACY problem in terms of 
trade-off between two forms of RECOGNITION ERRORS: 
rejecting GOOD DATA (handwritten, typewritten, printed), 
and accepting BAD DATA. 


The report refers to an installation where the document 
reject rates caused by recognition errors were less than 

6 %. In the light of the above framing of the accuracy 
problem, this could mean that 6 % includes both rejections 
and acceptance of bad data and that the figure is in terms 
of entries or characters. The author mentions another ins- 
tallation where by careful typewriting of originally hand- 
written data, rejections at the equipment were negligible 
while the error reject rate ( presumably accepted data that 
on subsequent processing proved to be wrong) zoomed to 


35 $. 


teeny 


A MODEL FOR MEASURING THE INFORMATION PROCESSING RATES 
AND MENTAL LOAD OF COMPLEX ACTIVITIES 


The author suggests that there is an alternative way to 
look at the problem of HUMAN ERROR when regarding the 
human as a communication channel and information proces- 
sor. Van Gigch aims at the calculation of the total 
amount of information transmitted from input stimuli to 
output responses, and to the determination of an infor- 
mation processing rate which characterizes the mental 
content of the work performed. 


The calculation of information processing rates 
can be applied to any industrial operation and 

process, and is particularly well suited to jobs 
where the degree of automation is such that the 


physical aspect of work has been practically eli- 
minated. 


The mental content of work, i.e: the total demand 

it makes upon the worker, should appropriately 

take into account both the complexity of the job, 

as measured by the entropy or degree of variability 
per step of cycle sequence, and the repetition 
rate of the operation cycle i.e. the number of times 
the operation has to be performed in a given period 
of time. Each of these two elements can be evaluated 
separately and combined by means of the model in 

a resulting informational load. This amounts to 
measuring the mental content of work in terms of 
information processing rates. 


The reported research indicates that the rate of 7.5 bits 
per second (peak) corresponding to an average sustained 
rate of 4.5 bits, as defined through the proposed model, 
might come to be considered as close to the maximum 
capacity of the human communication/processing channel 

in industrial jobs. 


Although it would have been useful to determine the 
level of ERROKS which accompnied different proces- 
sing rates in the study of some jobs in the forest 
product industry, this information was NOT obtained. 


Disregarding eventual scientific-methodological problems 
of the approach, one might assume that human error rates 
exhibit important variation when the mental load approa- 
ches what comes to be considered as the maximum capacity 
of the human information channel. The approach might permit 
taking into account the mental load of specific CODING 
PROCEDURES used in translating so-called real world events 
to the computer system language, 
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G.G.Neill Wright (1952) 


THE WRITING OF ARABIC NUMERALS 


As referred by M. JoOnsson in Mekanresultat 71008 (1971), 
one of the author's reported investigations consisted 
in having 93,320 arabic numerals to be written by 352 
and read by 130 people. Out of these numerals, 1,579 
digits were confused with others (mostly confusions be- 
tween O and 6) in reading, leading to an overall error 
rate of about 1.7 %. Jénsson presents a table on the 
nature and frequency of found transpositions: 


Besides some other data illustrating eventual influence 
of digits on the perception of those following them, 
Jonsson refers another of Wright's investigations aimed 
at determining the frequencies of unreadable and ambi- 
guous digits in the reading of 4h, 250 digits which were 
written by 212 people. A table shows that 015 % of the 
digits were UNREADABLE and 2.2 % were AMBIGUOUS, leading 
to what we might call a TOTAL ERROR RATE of about 2.7 &%, 


This last mentioned investigation also indicates that 

the digit 4 was the most frequently found to be unreada- 
ble, O and 6 were the most Frequently ambiguous, while 

1 and 4 where the least frequently ambiguous. No explicit 
recommendations are given on how to use these findings 

in the design and operation of EDP systems. 
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CASE STUDY ON DIFFERENCES BETWEEN 
PERPETUAL INVENTORY RECORDS 
AND ROTATING INVENTORY COUNTS 
of completed parts in stock in a manufacturing 
company. 











INTRODUCTION 


This study refers to the completed parts stock of 

a company manufacturing electro-mechanical machines. 
The company consists of, among dther units, a PRO- 
DUCTION UNIT, and a CONTROLLER'S UNIT, 

The former consists of several departments such as 
Production Control, Purchasing, Shop Floor and Stores 
while the latter includes the Accounting dept. which 
shares with Production Control the responsibility 

for the accuracy of inventory controi (stock figures). 


Plant Manager 
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The operations of the plant are sunnorted by inter- 
dependent programs run on the local computer system, 
and utilizing common files for purposes of inventory 
control, operation scheduling, control of enginee- 
ring data etc. 


The rotating inventory counts show that there are 
differences between the quantity of parts that should 
be found in stock, according to the program-maintained 
perpetual inventory records, and the quantities re- 
ported to be found through the rotating physical 
counts. Such differences were often judged by audi- 
tors and managers to be too great especially in face 
of the risk that the overall differences be still 
greater because of difficulties of estimation from 

the counted sample. 


This perceived danger motivated in the course of 
the years the three investigations which will be 
summarized here. They were done respectively by 
the staff of the assistant plant manager (1964), 
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the staff of the Production Control manager (1968), 
and by internal auditors (1969). This third inves- 
tigation by internal auditors can be said to have 
been perpetuated in terms of present classification 
of causes of differences and in terms of the organi- 
zation of follow-:.p statistics which are presentiy 
produced by a set of EDP application prograns,. 


The clerical personnel performing the rotating in- 
ventory counts (control) are physically located in 
the stock room but report directly to Accounting. 
Their findings are the source of inforthation used 
in producing the statistics analyzed in this our 
context. 


EXPLANATION OF SOME OF THE USED TERMS 


The purpose of the PERPETUAL INVENTORY, i.e. an 
EDP+implemented model of the stock, is to have an 
ACCURATE image of the flow of parts in the plant. 
This is accomplished by maintaining a perpetual stock 
record for each part in stock. This record is said 

to show the entries into stock, withdrawals from 
stock and the current balance, i.e, the number of 
parts that are (supposed to be ?) currently availa- 
ble in stock. 


The purpose of ROTATING INVENTORY CONTROL is to keep 
a so-called running"check on the ACCURACY" of the 
perpetual inventory records and to correct them when 
necessary. This is done by having regular counts 

made of various parts and comparing the actual count 
to the perpetual inventory record. Minor differences, 
or variances, are usually attributed to the use of 
scales in counting and to the so-called human factor. 
Greater differences are investigated for determination 
of causes and proper correction, The label of “error" 
may be given e.g. to those differences with a quanti- 
ty variance of plus/minus 5 %, or the value of 
which exceeds U.S. % 100. 


The operation of rotating inventory (RT) control is 
performed by RI-clerks who each morning visit the 
locations in stock where there are parts they intend 
to count. The clerks mark these locations by leaving 
in the stock bin a well visible "control card", that 
is later picked up when the clerk returns in the 
course of the counting tour. Stores personnel are 
expected to indicate on the card all transactions 
taking place prior to the control count by the RI- 
clerks, in order to enable the count result to be 
reconciled back to the previous night's closing ba- 
lance, 


Here follow some selections from our case study, 
chosen with a view on the purpose to illustrate the 
issue of accuracy, or quality of information, The 
study consisted in assembling and organizing the 
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results obtained by the three special investigations 
on inventory differences. It must be noted that our 
purpose was not to make an own investigation on the 
causes of differences but rather to evaluate the 
traditional practical way of approaching the pro- 
blem of accuracy in a specif.c, supposedly simpler, 
very concrete and realistic environment. Th’s imylies 
also that the material presented below does not pre- 
tend to have been gathered according to any precepts 
of scientific methodology: it is rather an evidence 
of traditional inve:'stigation technique or trou- 

ble - analysis in an industrial environment. In any 
case this material does not supply a complete eviden- 
ce since some details of our study were omitted here 
because they are not required for the present purpo- 
se. 


The investigator investigated every day during a period of 
some weeks, for a set of selected parts, the cause of diffe 
rences detected through the reports of the RI-clerks. He sum- 
marizes his findings in the following table 


CAUSES NUMBER OF VALUE IN MONEY 


CASES + - 
1.Placement of parts in the 

stock-room 3 - 28.852 
2.Placement of "control card" 3 - 3.480 
3.Erroneous counting 10 16.266 75.048 
4, Erroneous date 2 18.875 75 
5.Misunderstanding of verbal 

information 4 35.000 11.547 
6.Handling of invoices etc. 

e.g. punch error 2 370 6 
7,Unidentified causes 2 - - 
Totals during investigated period 70.511 119.008 
Gross differences 189.519 
Net differences 48.497 


The investigator does not summarize his findings in a table. 
A review of his report, however, reveals that he has found 
the following causes (values of differences are not reported 
here) 


1. Multiple stock locations, but only one was reported 

2. No stock Location was assigned to the part 

3. Error committed because personnel was inexperienced 

4, The "control card" was not properly placed by RI-clerk 

5. Control card was placed,but not used by stock personnel 

6. P,1I. (perpetual inventory) balance not filled on manually 
generated RI-control card (see note 1 below) 

7. Partial delivery was reported as complete delivery 
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(1969): THIRD INVESTIGATION 

We said earlier that this third investigation was made by 
internal auditors. We mean more specifically that they orga- 
nized the scheme for classification of errors and recommen- 
ded the types of desirable follow-up statistics on inventory 
differences and their causes. In this sense we can add that 
the third investigation became a running investigat:.on since 
it is continuously performed up to now. 


An year-end summary of this running investigation consisted 
of a table including the following causes and percent figu- 
res (percent out of a year total of about 900 found czuses) 


CAUSES PERCENT OF ™* 
CASES 
1. Part out, but was not reported out (of stock) 5 
2. Reported gut, but in fact still in (stock) 9 
3. Part in, but not reported ih 13 
4, Reported in, but still out 1 
5. Partial delivery; reported as complete (see note 2) 8 
6. No delivery,-reported as complete (see note 3) 9 
7. Wrong card punch, in delivery-out 4 
8. Wrong card punch, in delivery-in 1 
9. Error in handwritten transaction 6 
10. Error in the reporting of stock location 1 
11. Wrong count, delivery of wrong quantity 4o 
12. Other 6 
Total (corresponding to about 900 found causes) 100 


1. RI-control cards are normally computer generated by means 
of a program following the schedule: each part at least 
one RI control per year, high-value parts 4-times per year. 
On manually generated control-cards, however, if the last 
PI (perpetual inventory) balance in not handwritten on the 
appropriate field of the card, it will not be punched and 
the EDP program will calculate the new balance as the PI 
balance before the RI control PLUS the quantity found in 
stock on occasion of the control, 

2. The stock clerk forwarded the pre-punched card generated 
by the computer for stock-requisition, without thinking in 
the fact that he had found only part of the punched quanti- 
ty. The card should have been marked,corrected or changed. 

3. Incapability to deliver because of stock-out condition re- 
quires that the stock-requisition card which was computer- 
-generated be especially marked before forwarding to the 
computer center for data-processing. If not, the pre-punched 
card will be processed under the assumption that the deli- 
very of the pre-punched quantity was done. 


Let us now go over to a summary of the contents of follow-up 
statistics, manually and computer generated, administere by 
Accounting and distributed to responsible managers and other 
personnel with the purpose to enable improvements in the 
accuracy of inventory records. 
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SUMMARY OF CONTENTS OF FOLLOW-UP STATISTICS ON 
INVENTORY DIFFERENCES, ORIGINATED ON OCCASION OF 
THE THIRD INVESTIGATION (1969). 


end of the current year: 

1.1. Actual number of performed controls versus planned number 
(e.g. all parts are to be counted at least once per year) 

Results of RI activity - RI differences per month, year-to- 


2.1. Value of positive differences 
2.2, Value of negative differences 


2.3. Value of net differences 

2.4. Value of gross differences 

2.5. PI balance value of all RI controlled parts 

2.6. Gross value of RI differences in % of 2.5. 

2.7. Net Value of RI differences in % of 255% 

2,8. Number of accepted RI controls 

2.9. Out of 2.8, above, number with value higher than limit 
2.10.Percent value that 2.9. is of 2.8. i.e. percent of ac- 


cepted RI controls with value of difference higher than 
limit, e.g. 100 money units. 

2.11.Sums of the above, or accumulated value, for each one, 
each month, y-t-d. 

2.12.Same as 2.11 but for past year (for comparison). 

Acceptance of RI controls: 


3.1. Total number of RI controls (both new and repeated for 
the same part number) performed this month, past month 
and y-t-d. 

3.2. Number of accepted RI controls and what percentage they 
are of corresponding total number of RI controls as per 
3.1. above. 

3.3. Number of accepted RI controls with value of difference 
rpreater than 100, and what percentage they are of the 
number of accepted controls (3.2.) 

3.4. Out of 3.2. and 3.3. number of those with value of dif- 
ference greater than 500. 


fication of the following figures: 

4.1. Value of positive difference 

4.2. Value of negative difference 

4.3. Value of net difference 

44. Value of gross difference 

4.5. PI balance 

4,6. Sums of the above for all part numbers in the report 

4.7. Sum of gross differences in % of sum of PI balances 

8. Sum of net differences in % of sum of PI balances 

.9. Display of 2.1. to 2.7. above, for the current month, 
to allow the reader's comparison with corresponding fi- 
gures in 4.6. above. 

4,10.Percent value that figures in 4.6. are of related values 


in 4.9. 


(continues) 
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5.1. Number of distinct different part numbers with open (i.e. 
not yet accepted) negative balances at end of each month, 
y-t-d. 

5.2. Money value of negative balance (sum for all part numbers 
in the referenced month). 

5.3. Percent of part numbers for which causes of difference 
were found during the referenced month ( i.e. did not have 
to be "accepted"-without cause) 


6.1. Number of disctinct part numbers with open negative diffe- 
rences at end of referenced week. 
6.2. Money value of the negative differences. 
7.Negative balances - other than above 


7.1. Number of part numbers that during a referenced month 
showed some negative PI balance. 

7.2. Average per day of that month, calculated from 7.1 above. 

7.3. Money value of 7.1 above. 

7.4. How many distinct part numbers, during the referenced 
month, showed a negative PI balance, during how many weeks 
before correction Teeuenaiotaceon with knowledge of cause) 
or acceptance (reconciliation without knowledge of cause). 
7.5. List of particular part numbers that show negative PI ba- 

lances at the end of the month, not having been yet closed. 

7.5.1. For the above: for each part number, the number 
itself, name of the part, quantity of the differen- 
ce and its money value. 

7.6. Diagram over negative balances - curve showing the deve- 
lopment of the variable defined in 7.2., for each month 
y-t-d. 
8,Repeated RI controls 


8.1. Curve showing the development per week y-t-d of the per~ 
centage that repeated RI controls represent of the "first 
time" RI controls. (Objective may be e.g. 10 % for cur- 
rent year). 

8.2. Money value of the repeated RI counts above. 

9.Causes of differences 


9.1. For each cause-code, the number of part numbers whose in- 
vestigation led to correction of differences attributed 
to respective cause. 

9.2. The percent of all causes that each particular cause stood 
for. 

9.3. The percent distribution of causes y-t-d for this year 
and past year (for comparison purposes). 


We shall now go over from this"EDP-oriented" summary of the qua- 
lity of inventory records, to. background of these quality pro- 
blems: so-to-say the "causes of the causes" of the differences 
i.e. errors that were found in the course of the investigations. 


Such errors were not assembled and organized for analysis in 
nearly the same degree of formalization as the above statistics. 
A major part of our study consisted in identifying and gathering 
descriptions of errors from the three investigations, deleting 
as far as possible duplications of same descriptions,and trying 
to maintain the description formulated with the same words used 
by the original investigator. 
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SUMMARY OF ERRORS 
IDENTIFIED AND DESCRIBED IN THE COURSE OF THE 
INVESTIGATIONS, LEADING TO INVENTORY DIFFERENCES 


WRONG CODE was used for the particular stock-transaction. 
Such transaction codes are used in related cost~-accounting 
procedures and vary with the origin/destination of delive- 
ries to-from stock. A wrong code may unintentionally generate 
double as many transaction cards as actually required, lea~ 
ding to secondary errors such as negative balances etc. 

DELA D PARTS arrive physically after close-out of earlier 
inventory difference. In this way the earlier "correction" 

of a difference without knowledge of its real cause, causes 

a new difference. 

ERRONEOUS DATE. A set of parts is being manufactured in the 
shop floor: as soon as the first two pieces are completed, 
they are transported to stock. The stock clerk, however, 
waits for reporting their arrival to stock until the rest 

of the set arrives, since the pre-punched transaction card 
accompanying the first parts refers to the whole quantity of 
the set (same: job number). In the meantime a stock requisition 
arrives for one of the two pieces already physically in stock 
and it is delivered with an own transaction leading to e.g. 

a negative balance in the PI file. 

WRONG COUNT. Missing one box out of many boxes stapled on 
each other, and a great number of parts is packaged in the 
missed ~- hidden box. 

WRONG COUNT. Assuming that one box behind or below many un- 
opened boxes is also unopened containing a definite number 
of parts, while this is not true. 

WRONG COUNT. Assuming that the quantity in a box is the quan- 
tity declared by the vendor or printed on the box. Sometimes 
there are instructions forbidding the opening of boxes ex- 
cept in certain circumstances, because of contamination pro- 
blems or difficulty of later controllability, e.g. in ro- 
tating inventory control. 

QUANTITY EXCHANGED with department number when manually fil- 
ling-out a stock-requisition card, The wrong "quantity" ex- 
ceeds the physical stock balance resulting in an unexpected 
stock-out. This leads to detection of mistake in the delivery 
moment, resulting in that the originally intended quantity 

is actually delivered, but the requisistion card is not 
corrected. 

PART NUMBER EXCHANGED with another while copying from a do- 
cument where both appear near each other, 

WRONG PUNCH of quantity 100,001 instead of the intended 1; 
same for part number 856032 instead of the intended 856037 
(unclear handwritten digit 7). 

WRONG PART DELIVERED to a correctly filled requisition, 

PARTS ARE NOT FOUND because they are placed at locations 

that are not yet numbered because of shortage of manpower, 
PARTS ARE NOT FOUND because they are placed at stock loca- 
tions which were not reported as intended locations for that 
particular part number. 

PARTS ARE NOT FOUND because located in a "third" stock loca- 
tion. The EDP stock-updating application allows for regis- 
tration of a maximum of two stock locations. Additional ones 
must be tracked by means of manual methods. 

PARTS ARE NOT FOUND because too many different parts are sto- 
cked at the same one numbered stock location, and it is easy 
to overlook them. 
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CONTROL CARDS are not placed in certain stock locations be- 
cause they are kept locked early in the morning for security 
reasons. ’ 

CONTROL CARDS are not filled by stockroom personnel, They do 
not note them when expediting some parts requisitions, or 
they are not motivated to fill them, or they are not instruc- 
ted to do so. The RI personriel sometimes forgets to pick-up 
at the end of the day those cards placed in locations which 
they intended to visit but had no time left to, This has 
occasionally spoiled the confidence and motivation of stock-~ 
room personnel. On the othez hand such follow-up of left- 
-over cards places an additional unappreciated burden of 
clerical duties on the RI personnel. 

CONTROL CARDS. Stockroom personnel forgets to fill them, 
Compare with number 16 above, 

WRONG COUNT. The number of parts physically delivered from 
stock is not the same as the number on the requisition. 
MIXING OF SIMILAR PARTS. Upon closer examination, as for 
quality control purposes; it is discovered that an open box 
actually contains two different parts of similar appearan- 
ce. Several prior causes may be immagined. 

MISUNDERSTANDING OF VERBAL INFORMATION in the course of 
indirect observations, as when the question or the answer is 
misunderstood regarding the date of arrival or the quantity 
of certain parts or boxes. 

WRONG STOCK LOCATION is reported because the numbering sys- 
tem for stock locations is misunderstood by inexperienced 
personnel. : 

PARTIAL DELIVERY REPORTED AS COMPLETE since the pre-prepared 
transaction is not changed or complemented with an additio- 
nal transaction upon verifying that the observed event does 
not conform to the planned event. 

PI BALANCE NOT FILLED on manually generated control card, 
since this is normally not necessary with computer-generated 
cards where such information is prepunched by the EDP appli- 
cation. The updating program calculates then the new balan- 
ce as the last calculated in the PI file plus the balance 
reported by the RI count on the manually generated card. 

NO DELIVERY REPORTED AS COMPLETE. When stock personnel is 
unable to deliver a single piece of a requisitioned quanti- 
ty because of stock-out condition (zero quantity in stock), 
the requisition card should be especially marked and put 
apart for special EDP handling (emergency because of danger 
for line-stop). If the special handling-marking is not per- 
formed, the system assumes that the whole quantity was in- 
deed delivered. 

LOSS OF DOCUMENT in handling as when an invoice is put among 
other kinds of documents or forgotten at the bottom of a box 
which was opened for control of the quantity of parts in it: 
PARTS ARE NOT FOUND. A "third" stock location was reported 
to the system in belief that it was the second one. The EDP 
program accepts only a'maximum of two locations for the same 
part number. Upon reporting of the third one, the whole re- 
cord for the first location was lost (erased). 

WRONG IDENTIFICATION of the part - misunderstanding. The 
unit of a certain printed label was occasionally believed to 
be the label itself, a foil with glued a set of many of the 
labels, or a set of such foils. 


28. 


29. 
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WRONG COUNT. Small parts which are delivered in great quan- 
tities are counted indirectly by weighing them and relating 
the total weight to the unit weight. This introduces scale 
errors and related human factors. One of the investigators 
suggests that a percent difference in quantity up to about 
3 or 4 % could be normally ascribed to scale and such human 
factors. 

EXCHANGE OF MEASUREMENT UNITS. A very long cable arrives in 
a box marked with "length = 550" and it is assumed to refer 
to meters while it later prooves to have been feet. 


Note: No investigation refers to another remarkably obvious 
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source of differences which we will note for the sake nf 
completeness: 
THEFT. Equivalent to a lie or deliberately given false in- 
formation. 
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HISTORY OF QUALITY IN MANUFACTURING 


Technological stability in industrial operations: 
historical background of quality in manufacturing. 


Since the earliest days of man, artisans, engineers, and 
industrial administrators have undertaken the develop- 
ment of certain aspects of manufacture such as production 
method, production rate, and product quality, with the 
general aim of GETTING MORE OUTPUT, in some sense, for 

a given input. The most highly publicized and the most 
widely practiced of these techniques have been ascribed 
to F.W. Taylor who emphasized the planning of productive 
effort in such a manner that the outcome, output, of 
this effort was PREDICTABLE IN TERMS OF QUANTITY. Alth- 
ough Taylor also had in mind the QUALITY of product - 

in perhaps some vague sense, he was primarily concerned 
with predictions of quantity. 


Taylor stressed the ELEMENTIZING of operations and modi- 
fying methods. He further stressed the elimination of 
worker initiative and he proposed manufacturing procedu- 
res to better guarantee high output of mass production. 
While Taylor did stress wage payments in relation to ra- 
tes of output, he seemed primarily concerned with esta- 
blishing standardized RATES OF PRODUCTIVITY. And, in spi- 
te of his stress on standard production methods and his 
monumental technical job in "The Art of Cutting Metals", 
the heritage of his influence is largely to be found in 
the superabundance of persons in industry engaged in 
setting up rates of production based in part on time mea- 
surements, in part on individual judgement, and, in part 
on collective bargaining, For a half century or more, 
disciples of Taylor and other propounders of "efficient" 
manufacturing procedures, were concerned with devising 
"methods" by which they could predict MAXIMUM OUTPUT for 
given input (production RATES), with some vague notion 
of the "one best method" and so-called "fair day's work", 


About year 1925 it was openly realized that PRODUCT QUA- 
LITY HAS A DEFINITE BERARING ON OUTPUT IN THAT A PRODUCT 
WHICH DOES NOT CONFORM TO DESIGN SPECIFICATIONS CANNOT 

BE COUNTED IN THE OUTPUT. Product that is scrapped or 
reworked reduces the overall production rate. Also, if 
considerable inspection of product is necessary, the over 
all man-hour input for the accepted product is increased. 


About that time, W.A. Shewhart, of the Bell Telephone La- 
boratories, recognized the fact that ATTAINEMENT OF SPE- 
CIFIED PRODUCT QUALITY IS A FUNDAMENTAL PROBLEM OF SCIEN- 
TIFIC METHOD, A PROBLEM OF PREDICTION. Dealing with the 
problem of quality in mass-manufactured products, he re- 
cognized the INHERENT VARIABILITY IN REPETITIVE PROCEDU- 
RES and formulated a set of ideas which yielded operatio- 
nally verifiable criteria for the attainement of speci- 
fied product quality. He also noted that such criteria 
can be established only within the framework of an ACCEP- 
TED GOAL OR SET OF CONSTRAINTS. This goal was essentially 
economic in nature, in terms of impact of quality on cost 
of input and VALUE OF OUTPUT. 

Prediction of a quality characteristic within LIMITS 

was considered possible when a "constant system of chance 
causes" exists, or when equivalently "assignable cau - 
ses" do not exist. The latter were those which could be 
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ECONOMICALLY identified and eliminated. Criteria for 
discrimination between the two types of causes were ba- 
sed on principles of statistical inference and associa- 
ted precepts of probability, ("STATISTICAL CONTROL"). 


Fundamental to the attainement of quality, i.e. to the 
attainement of a state of the production process wherein 
it is possible to PREDICT WITHIN SPECIFIED LIMITS the 
quality that will be realized, is the following three- 
step continuing sequence as conceptualized by Shewhart: 


1. QUALITY SPECIFICATION. It is the HYPOTHESIS of 
the quality to be obtained. 


2. PRODUCTION METHOD OR THE PROCESS. It is equivalent 
to the EXPERIMENT in science, whose results are to 
be examined to determine whether the hypothesis is 
verified in fact. 


3. QUALITY EVALUATION, equivalent to the TEST OF HY- 
POTHESES in science. The results of the produc- 
tion process are inspected or tested and the in- 
pection or test results are evaluated to determine 
whether the specified quality has been attained, 


Until about the middle of the forties, statistical infe- 
rence had only rarely been applied ih testing hypotheses 
in the engineering sciences, Criteria of acteptance of 
physical hypotheses had usually been the JUDGEMENT of the 
individual engineer or scientist. While manufacture and 
scientific inquiry are quite parallel in respect to expe+ 
rimental inference, the requirements of attaining quali- 
ty in mass production differ, in that FAILURE MAY DESTROY 
THE MANUFACTURING ACTIVITY. The failure to attain predic- 
ted quality may mean SUCH LOSS AS TO PREVENT FURTHER MA- 
NUFACTURE. 


The three-step sequence in attaining industrial quality, 
therefore, must be continuing and self-corrective and 
must lead to the ralization of a constant chance cause 
system in the production process whereby the desired 
quality can be assured. 


(S.B, Littauer, 1950) 
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BASIC CONCEPTS OF QUALITY IN MANUFACTURING 


THEORY OF ERRORS AS VERIFICATION THROUGH PROBABILITY, 


VERIFIABILITY requires that any theory predict certain 
numbers which can be compared with the numbers gained 
by actual operations of measuring. 


In actual practice these numbers, which may be called 
THEORETICAL MEASURABLES and OPERATIVE MEASURABLES res- 
pectively, never correspond. It becomes necessary, there- 
fore, for the scientist to SPECIFY WHEN THE DEVIATION 
BETWEEN THEM IS SUCH THAT VERIFICATION OCCURS. These 
specifications are defined by the THEORY OF ERRORS in 
which the concept of probability has an essential place. 
(Northrop, 1947, p.200) 


ACCURACY AND PRECISION IN THE THEORY OF ERRORS. 


In the theory of errors we customarily assume that we 
may repeat the measurement of the length of e.g. a line 
AB (a segment), again and again at will, obtaining an 
infinite sequence of observations 


ME gi. Reis wey Kil gie ses 


We then assume that the segment AB has a TRUE length xX! 
which is constant for all time. Then we introduce the 
concept of an ERROR e'i of a SINGLE observation XA: '5 
defined by the relation 


e'i = Xi - xX! 


Now we come to the question of what is the meaning of 
the ACCURACY of the METHOD OF MEASURING the length of 
the segment AB by means of an engineer's scale. One of 
the things that are done in the theory of errors is to 
assume that the infinite sequence above has a LIMITING 
AVERAGE VALUE X! which defines the CONSTANT ERROR 


dt = X' - x! 


This constant error provides a kind of measure of the 
ACCURACY of the TEST METHOD or METHOD OF MEASUREMENT in 
somewhat the same way as e'i above provides a measure 
of the accuracy of the SINGLE OBSERVATION Xi. 

Usually, however, we go further and conceive of the 
accuracy of a given method of measurement as being de- 
termined by the frequency of occurrence of the numbers 
in the infinite sequence above, within some specified 
RANGE X' - L1, X' + L2. If we make L = LI = L2 then 
the distance L may be associated with the concept of 
PROBABLE ERROR, 


PRECISION seems to differ from the concept of accuracy, 
principally in that the clustering of the members in the 
infinite sequence is measured in terms of the fraction 


of these members within the range Xt -L, Xt « L, this 
range being related to the average xX! of the infinite 
sequence instead of the TRUE VALUE xX! being measured. 


In the context of manufacturing, a SPECIFICATION may be 
seen as fundamentally the statement of requirements as 
means to an end, which we idealize in terms of the 
classic concepts of accuracy and precision, 


ACCURACY involves in some way the difference between 
what is observed and what is TRUE. 


PRECISION involves the concept of REPRODUCIBILITY of 
what is observed. 


We could then say that accuracy is a measure of correct- 
ness, while precision is a measure of reproducibility, 
(Shewhart, 1939, p.124, 146) 


ESTABLISHMENT OF TOLERANCE LIMITS, 
AND "MEASUREMENT ERROR". 


When speaking of tolerance limits in terms of MEASURE- 
MENTS of some quality characteristic, it is often taci- 
tly assumed that the measurements themselves are "RIGHT" 
or "TRUE". Obviously, however, this assumption may not 
be justified and hence we need to take into account the 
DIFFERENCE BETWEEN THE CUSTOMARILY ACCEPTED CONCEPT OF 
THE TRUE VALUE X' of a physical quality, AND A MEASU- 
REMENT X OF THIS TRUE VALUE. (Ibid. p.71) 


In practice, however, we cannot discover the "true value": 
we can simply make measurements and draw inferences from 
such measurements ABOUT OTHER MEASUREMENTS NOT YET MADE, 
if we are to limit ourselves to inferences that can be 
operationally verified. (Ibid. p.87) 


The concept of TRUE VALUE leads us to CHOOSE operationally 
verifiable criteria that measurements must satisfy in 
order that they MAY BE CONSIDERED TO BE MEASUREMENTS OF 
THE TRUE VALUE X'. These criteria include those for 
CONTROL of any method of measurement (i.e. the sequence 
of measured values according to a given method must re- 
present a statistically controlled condition), and those 
for checking the consistency between measurements by 
DIFFERENT METHODS (i.e. the statistical limits of the 
averages of the first n terms of the sequences from 
different methods, as n approaches infinity - must be 
equal). IN PRACTICE, IT IS CUSTOMARY TO CHOOSE ONE OF 
THE METHODS OF MEASUREMENT AS A STANDARD, AND TO CONSIDER 
PRACTICALLY VERIFIABLE OPERATIONAL MEANING FOR THE RE- 
QUIREMENTS OF CONSISTENCY. (Ibid. p.72) 


As a final note, it should be understood that the set- 
ting of tolerance limits on the measurement of a so-cal- 
led physical constant (such as the velocity of light) is 
analytically the same problem as the setting tolerances 
on the true value of quality of pieces of a product of 
a given kind. The tolerance limits on a quality must 
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take into account not only the variability of the "true" 
quality, but also of the method of measurement. HENCE, 
THE PROBLEM OF SETTING TOLERANCES ON THE MEASUREMENT OF 
A PRESUMABLY CONSTANT VALUE OF A GIVEN QUALITY ALWAYS 
CONSTITUTES A PART OF THE JOB OF SETTING TOLERANCES ON 
A QUALITY CHARACTERISTIC. (Ibid. p-116) 


OPERATIONAL MEANING OF ACCURACY AND PRECISION 


The imvossibility of determining a "true value" in the 
sense of the theory of errors introduces the need of 
an operational meaning for accuracy and precision: 


We meet indefiteness in the definition of accuracy as 

a measure of CORRECTNESS; what measure is implied and 
what is this degree of correctness that we are supposed 
to measure ? 


Likewise for precision - AGREEMENT OF RESULTS AMONG THEM- 
SELVES is not definite because there isa large number 
of senses in which results might be said to agree among 
themselves. Precision as a measure of REPRODUCIBILITY is 
definite only if we know what measure is implied and if 
we know what is this measure of reproducibility that we 
are to measure. Furthermore: to what portion of the in- 
finite sequence of measurements with a given method do 
such statements as "agreement of results among themsel- 
ves" or the "reproducibility of the observed values" 
refer ? 


When trying to give operational meaning to accuracy and 
precision, the first thing to recognize is that there are 
two aspects of an operation of measurement: the quanti- 
tative-numerical (pointer reading) ,and the qualitative- 
physical. They both are required for a complete descrip- 
tion of the operation of measurement. Likewise the inter- 
pretation of experimental results must take into account 
both aspects of the operation in order to avoid ERROR 

OF JUDGEMENT based upon the observed results. 


Hence, to make any practically verifiable statement about 
a quality characteristic we must (at least): 


1. Specify each of the PHYSICAL operations of measurement 
to be considered. 

2. Specify the number of terms to be considered for each 
infinite sequence of observations corresponding to 
a method of measurement. 

3. Define the functions to be computed in terms of the 
set of observations. 

la Specify for each such function the interval within 
which the value of the function must lie if the jud- 
gement-statement involving that function is to be con- 
sidered true. 


The OBJECTIVITY of a quality characteristic in terms of 
the concepts of accuracy and precision will in any case 
exist only in the CONSISTENCY BETWEEN THRE INDEFINITELY 
LARGE NUMBER (METHODS) OF POTENTIALLY INFINITE SEQUENCES 
(OBSERVATIONS) constituting the numerical aspects of the 
operations of different methods of measurement 
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Finally it is important to note that there is not "the 
one" verifiable operational meaning of ACCURACY and 
PRECISION, but rather A CHOSEN such meaning. However 
we are not free to choose arbitrarily ANY verifiable 
meaning since we must limit ourselves to those alterna- 
tives that are ECONOMICALLY ATTAINABLE. In other words, 
tolerance requirements for accuracy and precision must 
be economic. (Shewhart, 1939, p.125-140) 


OTHER ECONOMIC ASPECTS: 
IN THE CONCEPT OF TOLERANCE LIMITS 


We may think of the "go, no-go" tolerance limits as con- 
stituting a means of screening a given product in res- 
pect to some quality characteristic. 


In this sense, TOLERANCE LIMITS ON A QUALITY CHARACTERIS-— 
TIC X fix the range within which the quality X of a 
piece of product must lie in order to conform to speci- 
fication and in order to fit into some mechanism that 

the engineer wants to make. The choice of the tolerance 
limits depends then upon the particular design. 


However, they will also be determined by the considera- 
tion of the percentage of the product made under commer- 
cial conditions that MAY BE EXPECTED to have a quality 
falling within that range. 


Another reason why the engineer under certain conditions 
must be concerned not only with the tolerance range but 
also with the PROBABILITY ASSOCIATED WITH THAT RANGE is 
in the case of DESTRUCTIVE TESTS. If the inspection test 
to determine whether the quality of a piece of product 
lies within the specified tolerance range is destructive, 
then it is only through a KNOWLEDGE OF EXPECTED VAPTABI- 
LITY of quality that an engineer can determine what 
assurance he has that the quality lies within its tole- 
rance limits. 


So long as we think of a tolerance range simply as go, 
no-go limits, our attention is centered primarily on 

the limits themselves. However, just as soon we begin to 
consider the establishment of tolerance limits from the 
viewpoint either of making EFFICIENT USE OF AVAILABLE 
MATERTALS or of maintaining an adequate degree of QUALITY 
ASSURANCE (especially needed when the inspection test is 
destructive), we must think not only of the tolerance 
limits but also of the probability associated with these 
limits, (Shewhart, 1939, p.50-51) 
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BASIC CONCEPTS OF QUALITY IN PHYSICS 
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Measurement of some PROPERTY of a thing, of the "funda- 
mental physical constants", and of other basic proper- 
ties of ature, in practice always takes the form of 

a sequence of steps or operations that yield as an end 
result a number that serves to represent the amount or 
quantity of some particular property of a thing - a num- 
ber that indicates how much-of this property the thing 
has, FOR SOMEONE TO USE FOR A SPECIFIC PURPOSE, 


PRECISION AND ACCURACY are inhetent characteristics of 
the MEASUREMENT PROCESS employed and not of the particu- 
lar end result obtained. 


ACCURACY is determined by the closeness to the TRUE va- 
lue characteristic,of successive independent measurements: 
of a single magnitude generated by REPEATED applications 
of the process under specified conditions. The true value 
is defined conceptually by an exemplar measurement pro- 
cess or the target value intended in a practical measu- 
rement process. Accuracy may be measured in terms of 
BIAS, or SYSTEMATIC ERROR, i.e. the magnitude and direc- 
tion of its tendency to measure something other than 
what was intended. Strictly speaking, the ACTUAL ERROR 
of a reported value, that is the magnitude and sign of 
its deviation from the truth, is usually unknowable. 
Limits to this error, however, can usually be inferred - 
with some risk of being incorrect ~- from the PRECISION of 
the measurement process by which the reported value was 
obtained, and from REASONABLE limits to the POSSIBLE bias 
of the measurement process, 


Although the accuracy REQUIRED for a reported value de- 
pends primarily on the INTENDED use, or uses, of the 
value, one should not ignore the REQUIREMENTS OF OTHER 
USES to which it is likely to be put. A REPORTED VALUE 
WHOSE ACCURACY IS ENTIRELY UNKNOWN IS WORTHLESS, 


PRECISION refers to the typical closeness TOGETHER of 
successive independent measurements of a single magni- 
tude generated by REPEATED applications of the process 
under specified conditions. Precision may be measured 

in terms of STANDARD ERROR of the reported value, which 
measures (or is an index of) the characteristic DISAGREE- 
MENT of repeated determinations of the same quantity by 
the SAME METHOD. The standard error is the standard de- 
viation of the probability distribution of estimates 
(that is, reported values) of the quantity that is being 
measured. 


In general the purpose for which the result is needed 
determines the precision and accuracy REQUIRED, and ordi- 
narily also the method of measurement employed.No single 
form for stating credible LIMITS to likely inaccuracy-im- 
precision is universally satisfactory. It is important to 
give a detailed account of the various components of im- 
precision and systematic error, so that EACH INDIVIDUAL 
USER OF THE FINAL RESULT MAY DECIDE FOR HIMSELF WHICH OF 
THE INDICATED COMPONENTS ARE, OR ARE NOT, RELEVANT TO HIS 
USE OF THE FINAL RESULT. (C. Eisenhart, 1968) 
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ORIGIN AND MEANING OF ACCURACY AND PRECISION 
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The application of the concept of "best decision" (as 
it is commonly understood) to pure research, requires 
the evaluation of the losses (and gains) from falsely 
(or correctly) rejecting a "pure" tesearch hypothesis 
or the evaluation of the losses due to ERROR in estima- 
ting the value of a parameter WHEN THIS ESTIMATE MAY BE 
USED FOR MANY PURPOSES OF WHICH THE RESEARCHER CANNOT 
BE AWARE. 


Since these evaluations do not seem possible, it appears 
that the pure researcher requires A CRITERION OF "BEST 
ANSWERS TO QUESTIONS" WHICH HAS NO REFERENCE TO OUTCOMES 
OF DECISIONS AND THEIR VALUES. 


Every concept of ERROR contains an implicit set of as- 
sumptions concerning the value of the consequences, 

From this we will not conclude that the pure researches: 
must explicitly formulate consequences and their values - 
for this he clearly cannot do in many circumstances-— but 
that HE MUST MEASURE AND REPORT ERRORS IN SUCH A WAY THAT 
THEY CAN BE ADJUSTED TO SUIT CIRCUMSTANCES IN WHICH THD 
VALUES OF CONSEQUENCES DIFFER FROM THOSE IMPLICIT IN HTS 
MEASURE OF ERROR, 


In the context of estimating the true value of a parame~ 
ter, ERROR MUST BE MEASURED IN A WAY WHICH DOES NOT PRE- 
SUPPOSE KNOWLEDGE OF THE TRUE VALUE OF THE PARAMETER BE- 
ING ESTIMATED. This is done by measuring properties of 
the set of estimates yielded by an ESTIMATING PROCEDURE, 
rather than by measuring the properties of any one spe- 
cific estimate. 


Generality of scientific results - their applicability 
over a wide range of conditions - is not possible with 
any single estimated value of a parameter. DIFFERENT BES- 
TIMATES DERIVED FROM THE SAME DATA are required for dif- 
ferent circumstances. Consequently, the objective of an 
estimating procedure should be to provide the information 
necessary for PREPARING THAT ESTIMATE IN ANY SPECIFIC SI- 
TUATION WHICH MINIMIZES THE EXPECTED COSTS OF FRRORS DUE 
TO ESTIMATION, 


Ultimately, then, the best answer to a question is one 
which can be used in any problem situation to obtain a 
best solution. 


TRUTH AND ERROR OF INFORMATION HAVE NO MEANING INDEPEN- 
DENTLY OF THE WAY IN WHICH INFORMATION IS APPLIED. 
"Correspondence with reality" cannot be used to measure 
error, since reality is not known in a way which permits 
such computation. Information corresponds to reality in 
any specific situation to the extent that it can be used 
to accomplish somebody's objectives in that situation; 
that is, to obtain best solutions to problems. (p.61-63) 
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The TRUE measure of a given distance will be the Limit 
("stochastic limit") of an infinite set of observa- 
tions, all in "STATISTICAL CONTROL". When lack of 
control results, the scientist changes his theory, so 
that theory depends on observation, and yet no obser- 
ieee be made without some presupposed theory. 

Pp. 57 


All questions requiring a QUANTITATIVE answer (i.e, 

a number of some sort) are not questions receiving an 
immediate answer. For to measure anything, an instru- 
ment of measurement is required, and all such instru- 
ments presuppose the principles by means of which they 
were constructed. Even discrete counting presupposes 
laws of addition and certain principles of succession, 
Similarly it can be shown that also questions concer~ 
ning qualitative relationships between objects cannot 
be answered immediately since they presuppose the 
answering of other questions. (p.121) 


In the context of discussing experimentalism, Churchman 
describes the experimental process, usually called the 
PROCESS OF EXPERIMENTAL CONTROL. The nature of such 
control is formalized in order to describe science's 
way of approaching its ideal of absolute PRECISION. To 
summarize, an experiment is said to be CONTROLLED if we 
state all the formal conditions under which a mathemati- 
cal function of a series of observations approaches a 
limit stochastically. Such definition of experimental 
control is then made the criterion of MEANING: No ques-— 
tion of FACT can be said to have meaning unless there 
exists a CONTROLLED EXPERIMENT for its answering. (p.182) 


Granted postulate: of experimentalism, it is always 
possible to find a formal image of nature that will 
enable us to reduce the "ERROR", with an increase in 
the number of observations, to a quantity less than 
any given amount. Furthermore, the DEGREE OF PRECISION 
(corresponding to an “error of the error") can also 

be thought to be measurable. In terms of the basic me- 

thodology of experimental science we can then define 

the concepts that are fundamental to any theory of 
knowledge, meaning, TRUTH, and REALITY. Two of the 
concepts are: 

1. The TRUE ANSWER TO A QUESTION OF FACT - is that 
single value for which the ERROR OF OBSERVATION is 
zero. 

2. The TRUE IMAGE OF NATURE - is that image which will 
produce EXPERIMENTAL CONTROL for all series of ob- 
servations, finite or infinite. (p.183) 


Progress in the accomplishment of the scientific pur- 
pose may be measured by the reduction of the ERROR OF 
MEASUREMENT. The ideal of errorless measurement can 
only be approached by taking observations in indefini- 
tely increasing number, and there is a constant demand 
for the experimenter to decide whether the ideal is ap- 
proached satisfactorily, i.e. whether the observations 
are "IN CONTROL". (p.267) 
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PRECISION is one of the needs satisfied by STANDARDI-- 
ZATION in the context of measurements. This is the 
need to DIFFERENTIATE ASPECTS OF THE WORLD WE LIVE 
IN. The planning of a large meeting only demands a 
rough notion of the size of the crowd, say, between 
2000 and 3000, in order to select a meeting hall eco- 
nomically; but the planning of a dinner meeting re-~ 
quires much greater precision. (p.90) 


Without standards, one would have to report all the 
relevant information about time, place, observers, 
procedures, etc., in addition to the DATA REPORT it~ 
self. Otherwise, no one would know what values to 
assign to the variables in the laws that enable one 
to use the report IN OTHER CIRCUMSTANCES. But once 

a standard has been given, then all data reports can 
be adjusted to the standard, and all that is needed 
is the data report itself. THUS, THF STANDARD CONDI- 
TIONS CONSTITUTE A DATA-PROCESSING DEVICE THAT SIMPLI- 
FIES THE AMOUNT OF REPORTING REQUIRED. (p. 91) 


The aim of minimizing the effort to adjust data usual- 
ly CONFLICTS WITH THE AIM OF PRECISION. In effect, the 
"cost" of adjusting data rises as more precision is 
attained, just as the cost of absence of precision 
goes up as we attempt to find "simpler" data. Experion 
ce has shown that it is possible to be naive with res~ 
pect to precision in an attempt to be SIMPLE IN PROCE- 
dures. ALL OF THE SUPPOSEDLY "SIMPLE" INSTANCES, - 

A REPORT OF A WITNESS, OF A LABORATORY TECHNICIAN, OF 
A STOCK CLERK - ARE NOT SIMPLE AT ALL IF THE DECISION 
ON WHICH THEY ARE BASED HAS ANY IMPORTANCE. Many 
"checks on the accuracy" of the data amount to setting 
up standards to which the data can be adjusted.(p.90) 


Besides of standardization etc, two other most impor- 
tant aspects of measurement are the accuracy of the 
measurements and the control of the measurement pro- 
cess. 


ACCURACY is itself a measurement - the measurement of 
DEGREE TO WHICH A GIVEN MEASUREMENT MAY DEVIATE FROM 
THE TRUTH. Since truth is related to the uses to which 
measurements are put, and since measurements are pie- 
ces of information applicable in a wide variety of 
contexts and problems, it MUST BE POSSIBLE TO FIND 
ACCURACY MEASUREMENTS which ARE APPLICABLE IN SUCH A 
WIDE VARIETY OF CONTEXTS AND PROBLEMS. The problem of 
accuracy is then to develop measures that enable the 
user of the measurement to evaluate the information 
contained in the measurements. (p.92) 


CONTROL is the long-run aspect of ACCURACY, It provi- 
des the guarantee that measurements can be used ina 
wide variety of contexts. In other words, a control 
system for measurement provides OPTIMAL INFORMATION 
ABOUT THE LEGITIMATE USE OF MEASUREMENTS UNDER VARYING 
CIRCUMSTANCES. (p.93) 
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One of the most significant aspects of modern science 
is the realization that one does not measure unless one 
also measures the ERROR of measurement. (p.101) 


A scientist realizes that without some estimate of error 
HIS MEASUREMENTS ARE MEANINGLESS. But accountants and 
managers want their cost data "exact". They think of 
"cash on hand" as the most PRECISE measurement because 
there can be relatively little error in this figure, 

What they do not seem to realize is that a precise figu- 
ve in this sense of. precision also contains very little 
information about the state of the system. Or, rather, 

if a firm's goal is to leaim, it learns least from pro- 
cise figures. One might try to conceive of independent 
judgements of costs as the "elementary observations" that 
statistical theory requires, in an attempt to use statis- 
tics in other than its strong orientation towards sta- 
tistical deviations in controlled experiments. (p.335) 


Measurement includes the process of CONTROL. In other 
words, measurement is an organization of experience in 
which information is "fed back" concerning the ACCURACY 
of the measurements. "Accuracy" entails information 
about the possible deviations of the measurements from 
reality. This may be interpreted as meaning that ACCURAGY 
is information about the VALUE OF THE MEASUREMENTS FROM 
THE POINT OF VIEW OF THE OUTCOMES OF THE ACTIONS WHICH 
HAVE BEEN PARTIALLY DETERMINED BY THEM. One of the most 
significant results of modern scientific method has been 
the ABILITY TO ESTIMATE ACCURACY WITHOUT KNOWING EXACTLY 
WHAT REALITY IS, THAT IS, WHAT THE BEST ACTION IS.(p.101) 


ACCURACY AND CONTROL are the concepts which define the 
consistency of measurement reports. The concept of ACCU- 
RACY OF MEASUREMENT can be used in at least two senses. 
First, a measurement process may fail to be accurate in 
the sense that it is not consistent. For example, REPETI- 
TIVE OBSERVATIONS DIFFER "TOO MUCH" OR FAIL TO AGREE SUF- 
FICIENTLY WELL WITH THE FORMAL STIPULATIONS. Second, a 
measurement process, though consistent, may have VERY 
POOR ACCURACY FOR A SPECIFIC PURPOSE. Thus, we can say 
that a set of data are inaccurate and mean either that 
the set is inconsistent relative to certain formal rules, 
or that the set has a very low measure of accuracy.(p.127) 


CONTROL is the process of deciding when to test for 
ACCURACY and what corrective action to take when it is 
decided that the accuracy requirements are not met. (p.128) 


Normally, control is said to exist only if the adjusted 
observations are statistically consistent (statistical 
control). But it may be that control defined in terms of 
many repetitions of adjusted observations is too narrow 
for measurements made outside of the laboratory or outsi- 
de a precisely controlled production line. IF SCIENTIFIC 
METHOD IS TO BE EXTENDED TO DECISION-MAKING IN GENERAL, 
THE IDEALS OF ACCURACY AND CONTROL WILL ALSO HAVE TO BR 
REDEFINED. (p. 129) 
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Measurement is sometimes described as the assignment of 
numbers to things, but it may. be far more useful to 
define it as the activity of creating PRECISE, ACCURATE, 
and GENERAL information. 


PRECISION and ACCURACY enable us to make refined choices 
and hence reduce the risk of ERROR. If I say to you, 
"Take the bus to get to my home", I am being imprecise 
though perhaps accurate because taking some bus is the 
only feasible way to get there. If I say, "Take the 43 
bus at Market and Fillmore leaving at 5:00 P.M. week- 
days", I am being precise, but perhaps not accurate if 
no such bus runs at that time. 


"GENERAL" information is information that can be used in 
a wide variety of times and places. If the bus schedule 
changes each day, my precise information may not be ge- 
neralj I could make it general by giving you a day-to- 
day schedule; so that no matter when you arrived you 
would know when to catch the bus, (p.161) 


In the context of VALIDITY of measurements: the root 
meaning of the word validity is the same as that of the 
word VALUE - both derive from a term meaning STRENGTH, 
The usual characterization of a valid measurement is 

that it "measures what it purports to measure". The vali- 
dity of a measurement refers then to its VALUE or in 

WHAT SOMEBODY IS ABLE TO DO WITH IT. Close to the latter 
meaning is the possibility to regard THE VALIDITY OF A 
MEASUREMENT AS A MATTER OF THE SUCCESS WITH WHICH THE 
MEASURES OBTAINED IN PARTICULAR CASES ALLOW US TO PREDICT 
THE MEASURES THAT WOULD BE ARRIVED AT BY OTHER PROCEDU- 
RES AND IN OTHER CONTEXTS. (p.198-199) 


The ERROR of measurement is itself a measure of our fai- 
lure to achieve what we aspired to; validity is a matter 
of the scientific significance of our aspiration. The 
study of sources of error affecting the validity of mea- 
surements introduces new concepts such as sensitivity, 
reliability, accuracy and precision, 


One source of error is insufficient SENSITIVITY, which is 
a measure of the discriminating power of an instrument or 
procedure of measurement. 


A second type of error is associated with the concept of 
RELIABILITY, which is a measure of the extent to which 
a measurement remains constant as it is repeated under 
conditions taken to be constant. Among these conditions 
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the observer making the measurements is of particular 
importance. Accordingly, reliability is often interpreted 
as a kind of INTERSUBJECTIVITY: the AGREEMENT OF DIFFR- 
RENT OBSERVERS on the measures to be assigned in parti- 
cular cases. But changes in the circumstances of measu- 
rement other than the identity of the person making the 
measurements are also involved in reliability. 


A measurement which is free of systematic ‘error is said 
to be ACCURATE. This is not to be confused with PRECISE, 
an attribute which depends on reliability as well as on 
sensitivity. 


What is RANDOM ERROR and what is SYSTEMATIC BRROR depends 
on what we are taking into account in the assignment and 
interpretation of our measures; As Coombs puts it, "the 
measurement theory assumed in analyzing data becomes a 
part of those data, and such portions of the data which 
are incompatible with the a priori abstract system are 
rejected arid regarded as constituting (random) error va- 
riance," A systematic error, in short, is one due to a 
factor whose effect was presumed to be already incorpora- 
ted in the theory of that measurement; effects due to 
other factors are called random. (p.199-201) 


What was said above suggests the need of a concept of 
truth and of true measure. What we can say is something 
along the following lines. 


As we increase the sensitivity, reliability, and accura- 
cy of our measurement of some magnitude, we find (or ho- 
pe to find) that the measures increasingly exhibit a 
CONVERGENCE TOWARD SOME PARTICULAR VALUE. This value can 
usefully be dealt with as the mathematical limit toward 
which the measures tend, THE "TRUE MEASURE" OF THE MAGNI- 
TUDE IS NOTHING OTHER THAN THIS LIMIT. 


Instead of saying that a new procedure or instrument of 
measurement is an improvement over the old because it 
comes closer to the "real value" of the magnitude, it may 
be less misleading to say that it is an improvement be- 
cause the "true measure" specified in its terms is more 
useful scientifically than the old "truth" was.(p.201-216) 


Even if a particular measurement were quite free from 
error and wholly exact, replications of the measurement 
would almost certainly fail to yield always identical 
measures. Both our concepts and the contexts in which 
they are applied are open to some extent: DIFFERENT OB- 
SERVERS WILL HAVE SOMEWHAT DIFFERENT CONCEPTIONS, AND 
WILL VIEW SOMEWHAT DIFFERENTLY WHAT WE CALL THE "SAME" 
SITUATION. TO OBJECTIFY THE RESULTS OF INQUIRY WE MUST 
PROVIDE SOME DEGREE OF INTERSUBJECTIVE CONSTANCY, As Sa- 
vage suggests, statistics may be seen as dealing with VA- 
GUENESS AND WITH INTERPEKSONAL DIFFERENCE IN DECISION SI- 
TUATIONS, EXPLOITING SIMILARITIES IN THE JUDGEMENTS OF 


CERTAIN CLASSES OF PEOPLE, and in seeking devices, nota- 
bly RELEVANT OBSERVATION, that tend to minimize their dif- 
ferences. A NUMBER OF OBSERVERS EACH MAKING HIS OWN ESTI- 
MATE OF A CERTAIN MAGNITUDE, OR A SINGLE ORSERVER MAKING 
ESTIMATES ON SUCCESSIVE OCCASIONS, provide findings to be 


reduced to some underlying unity,or less divergent sect. 
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If, in an experiment, one value obtained by the parti- 
cular measurement process is a long way from the other 
values in a SERIES OF REPLICATE DETERMINATIONS OF THE 
SAME CONSTANT MAGNITUDE, or if for instance in a least- 
squares analysis one reading is found to have a much 
greater residual than the others, THERE IS A TEMPTATION 
TO REJECT IT AS "SPURIOUS" OR "OUTLIER", 


The temptation arises from the experimenter's feeling 
or JUDGEMENT that in this way he ¢an minimize the loss 
of so-called ACCURACY of the experiment due to the two 
possible ERRORS: rejecting a VALID observation or accep- 
ting a defective one. 


Several outstanding statisticians have given attention 
to this problem which has been recognized since more 
than hundred years ago. Some of their thoughts may be 
summarized as follows. 


SOURCES OF VARIABILITY IN READINGS 


Variability or dispersion in a set of observations can 
be seen as arising from several different sources, If 

we are for instance investigating the height (stature) 
of persons employed at a particular place we may have 

variability due to: 


1. INHERENT VARIABILITY, It would be observed in the 
population even if all measurements were PERFECTLY 
ACCURATE, It cannot be reduced without changing the 
population itself, THE OBJECT OF THE STUDY. If we 
are interested in the MEAN stature of the population, 
we may refer to the variability as "error" since it 
gives rise to estimation error; but the name is mis- 
leading. In connection with the concept of "popula- 
tion appears also what statisticians may call "error 
of contamination": it occurs when a certain proportion 
of the observations came from a population which is 
SIGNIFICANTLY DIFFERENT from the one in which the 
experimenter is interested, and there is no way to 
discover which populations yield which observations. 


2. MEASUREMENT ERROR. It is due to the measuring instru- 
ments. In measuring height, if readings are made to 
the nearést centimeter, it is usually assumed that 
measurement error should not exceed half a centimeter, 
but in fact it sometimes does. One may count asa 
measurement error also any ARITHMETICAL MISTAKE in 
reducing the original notebook entries to the form 
in which they are quoted as observations (e.g. "cle- 
rical errors"). 


3. EXECUTION ERROR. It is intended to denote any DISCRE- 
PANCY BETWEEN WHAT IS INT¥NDED TO BE DONE AND WHAT IS 
ACTUALLY DONE, other than error in the use of measn- 
ring instruments. Here should also be included the 
above mentioned errors of "contamination", for example 
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including in the sample of measurements the height 
of some person not belonging to the population, to 
measure something other than height, or to select a 
biased sample of the population. 


CRITERIA FOR REJECTION 


One of the most important results of finding an apparen- 
tly "wild" or otherwise anomalous observation, i.e. an 
"outlier", can be the CORRECTION OF A FLAW IN THE MEASU- 
REMENT PROCESS, or - even better - the creation of NEW 
INSIGHTS INTO THE PHENOMENA UNDER STUDY. 


This presents one basic difficulty in finding criteria 
for rejection of outliers. Furthermore: can realistic 
tejection models be worked out for cases when the pro- 
bability of a blunder, e.g. missing an observation, de- 
pends on the value that would have been observed if the 
blunder were not present ? 


IT APPEARS THAT THE BASIC CRITERIA FOR REJECTION IN 
STATISTICAL MATERIAL DEPENDS ON WHAT WE ARE AFTER AND 
ON THE NATURE OF OUR MATERIAL. If our observations are 
five determinations of the percent of chemical A ina 
mixture, and one observation is badly out of line, A 
CHECK OF THE EQUIPMENT MAY SHOW that the outlier stemmed 
from an equipment MISCALIBRATION that was present only 
for the one observation. If the GOAL OF THE EXPERIMENT 
is only to estimate the percent of A in the mixture, it 
would be very natural simply to omit the wild observa- 
tion in case we cannot correct for the magnitude of the 
miscalibration. However if the goal of the experiment 
is that of INVESTIGATING THE METHOD OF MEASURING the 
percent of A (say in anticipation of setting up a routi- 
ne procedure to be based on one measurement per batch), 
then it may be very important to keep the wild observa- 
tion in. IN THIS WAY WE CAN LEARN SOME LESSON ABOUT THE 
METHODS OF SAMPLING, MEASUREMENT, AND DATA REDUCTION (as 
opposed to the underlying physical phenomenon). 


As another example suppose that 50 bombs are dropped at 

a target in a military operation, that a few go wildly 
astray, that the fins of these wild bombs are observed to 
have come loose in flight and that their wildness is un- 
questionably the result of loose fins. IF WE ARE CONCER- 
NED WITH THE ACCURACY OF THE WHOLE BOMBING SYSTEM, we 
certainly should not forget these wild bombs. BUT IF OUR 
INTEREST IS IN THE ACCURACY OF THE BOMBSIGHT, the wild. 
bombs are irrelevant. 


Another approach to the problem of outliers recognizes 
that it is not basically a problem of rejection, which 
may typically be treated with the method of significance 
tests. It is not so often a matter of studying whether 
and how often outliers occur in a certain field, but 
rather a study of guarding oneself from their adverse 
effects by answering the typical "insurance policy" 
questions: 
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1. What is the "premiuni" ? 
2. How much protection do I get in the event of error ? 
3. What is the probability of error ? 


leading to a compromise between rejecting a valid obser- 
vation or accepting a defective one. Many studies about 
rejection of outliers have focused on the third question 
while obviously all three are important since e.g. low 
premium and good protection decrease or eliminate the 
need of an answer to the third question. 


Seen in still another dimension, the problem of rejection 
of outliers is one of increasing complexity according to 
the following scale based on degrees of KNOWLEDGE ABOUT 
APPARENTLY WILD OBSERVATIONS: 


1. We know even BEFORE ah observation that it is likely 
to be wild, e.g. because of a physical incident that 
occurred to the equipment. 


2. AFTER the observation we cah reconstruct a causal 
pattern by checking with e.g. a laboratory notebook 
or by retrieval from memory of historical data. 


3. WITH NO OTHER EVIDENCE, we want to reject the outlier 
only based on the PATTERN OF THE OBSERVATIONS THEM- 
SELVES, 


Eventually, besides of the previously mentioned errors 
of so-called contamination, measurement and execution, 
statisticians may also justify treating the data by some 
method of outlier rejection on the premise that OUTLYING 
OBSERVATIONS ARE INHERENTLY MORE DIFFICULT TO OBSERVE 
AND RECORD so that their PRECISE VALUES are less TRUST- 
WORTHY. It is usual in such cases to speak of observa- 
tions that are INACCURATE rather than SPURIOUS. 
Statistical techniques have been developed for treating 
or "censoring" a few values on each extreme ("tail") 

of the distribution, 


(F.J. Anscombe, 1960; T.S. Ferguson, 1961; W.H. Kruskal, 
1960a and 1960b) 
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HISTORICAL CRITICISM 


It is ah aim of historical research to DRAW INFERENCES 
ABOUT THE PAST THAT ARE IN SOME WAY VERIFIABLE. With 
this purpose it utilizes several kinds of remnants; 
like in archeological research, but also many available 
reports in narrative form, etc. 


SOURCE CRITICISM 


Typically a historian recognizes the need to evaluate 
a historical SOURCE on the basis of three main dimen- 
sions: 


1. GENESIS, That is its coming into being: when and how, 
WHO determined such an event - what person, private 
or public organization WITH WHAT INTERESTS. The si- 
tuations around the origin of the source lead to a 
common classification of the information along its 
DEGREE OF PRE-PROCESSING: 

a) ORIGINAL DATA, which are the oldest data available, 
e.g. accounting information in a firm, on which the 

b) RAW MATERIAL or PRIMARY MATERIAL is based on, e.&@. 
the filled forms that the firm has prepared on re- 
quest of some state agency. 


The raw'material is furthermore seen as originating 
bl) PRIMARY STATISTICS for which the material was 
expressely obtained, and 

b2) SECONDARY STATISTICS which is the result of pro- 
cessing that was not envisaged at the time of obten- 
tion of the material. 


2. CONTENT. The source is classified as a 
FIRST-HAND SOURCE or as a 
SECOND-HAND SOURCE 
according to its DISTANCE TO THE HISTORICAL SITUATION. 
Does the information refer to something that the re- 
porter himself has seen or heard, or are there several 
links between the event and the reporter ? It is also 
important to consider what FORM OF EVIDENCE is offered 
by a second-hand source: a picture, a copy of a docu- 
ment or barely a repetition of a rumour. 


The above classification of sources overlaps with the 
previously mentioned classification according to the 
degree of pre-processing: ORIGINAL DATA AS WELL AS 
RAW MATERIAL MAY BE EITHER FIRST-HAND OR SECOND-HAND 
SOURCES' PROVUCTS. For instance, advertisements for 
political meetings - appearing in available copies of 
newspapers are original data, however they are first- 
hand for an investigation of volumes of political ad- 
vertisements while they are second-hand for an inves- 
tigation about times, places, and speakers at the mee- 
tings. Analog points can be raised regarding Custom's 
reports on quantities and values of goods exported or 
imported to-from certain countries, 
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Quantitative analysis of source contents gives rise 
to certain definitions of so-called RELIABILITY, RELE 
VANCE, and VALIDITY. For instance, in an investiga- 
tion of written material published by the press on po- 
litical questions, the articles are coded in CLASSES 
OF POLITICAL MATTERS and the VOLUME of the writings 

is measured, for example, in number of lines, 

The RELIABILITY of the investigation is then said to 
MEASURE THE PRECISION of the measurements, and it is 

a FUNCTION OF DIFFERENCES OBTAINED BY DIFFERENT RE- 
SEARCHERS performing the same investigation. If the 
same investigation also aims to measure the INVOLVE- 
MENT of the political parties in the political debate, 
it might attempt to measure the frequency of the 
PARTIES' NAMES per e.g. 100 lines of press-text, Is 
such a measure an expression of involvement ? Are 
such names a source with RELEVANCE for the question 
that was asked ? If not, the investigation will have 
low VALIDITY, 


3. FITNESS FOR USE. This refers to the use to answer tho 
posed questions. Such evaluation is based on two di- 
mensions: relevance and credibility, 

a) RELEVANCE, An example is the reporting of Customs’ 
authorities about charge and receipts of duties. 
They are directly relevant for an investigation 02 
incomes to the State, while they must - if at all 
possible +» be adjusted for smuggling and dutyfree 
goods wheh used in investigations of volume of 
trade. THE RELEVANCE IS THEN RELATED TO THR US® 
AND GOAL OF THE USE OF DATA, 

b) CREDIBILITY. It is evaluated on the basis of the 
INTERNAL CONSISTENCY of the report, its "probabi- 
lity" (based e.g. on commonly accepted truths), 
the reporter's judged possibilities to understand, 
notice, and reproduce what is described, and even- 
tually his subjective qualifications, reputation. 
It is, for instance, barely credible that in an 
armed conflict one party can count at the end of 
each day the enemy's casualties down to the last 
man or airplane. 


Most other problems related to source criticism which 
appear in historical research literature are known in 

the context of statistical method. One outstanding pro- 
blem appears to be the DEFINITION OF THE POPULATION in 
terms of TIME, SPACE and the ATTRIBUTES OR QUALITIES of 
its ELEMENTS OR INDIVIDUALS. This problem is the back- 
ground of some of the main difficulties and errors in in- 
vestigations e.g, related to CHANGES in geographical-ad- 
ministrative limits of territory, in classification-allo- 
cation among categories-codes, or related to so-called 
"non-responses" or "missing" observations, 


We will now illustrate the application of this theoreti- 
cal framework to some concrete examples and develop such 
examples in the context of sources of ERRORS in case 
studies. 
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SOURCES OF ERRORS IN CASE STUDIES 


To take the terminology question first, case studies on 
the FITNESS FOR USE of a source (was seen to be evalua- 
ted in terms of RELEVANCE and CREDIBILITY) show that 
both relevance and credibility are affected by specific 
types of errors. 


RELEVANCE may be affected by ERRORS in, or simply by 
CHANGES in data-collection or classification procedures: 
an increased reporting of rate of crimes may be caused 
by a more efficient reporting system rather than by an 
actual increase in the number of committed crimes, Or 
the definition of "crime" itself might have changed in 
the meantime leading to the inclusion among crimes of 
events that earlier were not considered as such, in spite 
of occurring as often as now. Or the rate may stay con- 
stant in spite of the crimes leading to more serious 
consequences. 


CREDIBILITY is said to depend partly on the COMPLETENESS 
and partly on the CORRECTNESS of the statemonts, 

a) COMPLETENESS is said to be affected if for instance 
when trying to count the population in a region by means 
of direct method, a great number of the people hide out-— 
side the region with the intent of not being registered 
(e.g. because fearing a heavier taxation). 

b) CORRECTNESS is said to depend on the goodwill and ca~ 
pability of those who gave the statements or delivered 
the data: peasants will report greater numbers of live-~ 
stock if they believe that the report will be used for 
allocation of fodder or financial support, rather than 
if they suspect that it will be the basis for taxation; 
furthermore it may be impossible to count the live-stock 
down to the last unit at the end of a given day. 


The evaluation or estimation of ERRORS in historical 
statistics' material is said to be possible by means of 
two methods: 

a) CONFRONTATION OF INDIVIDUAL SVYATYMENTS, as exemplified 
in investigations that compare the live-stock figures in 
taxation records with corresponding figures in documents 
on the distribution of inherited stock among heirs, 

b) STATISTICAL ANALYSIS of the so-called "REASONABLENESS" 
of sums and results. It is typical of population statis- 
tics and is based on well known probability-distribution 
thinking. 


(As an additional case, Morgenstern cites Hans Delbriick 
who found that if the Greek claims regarding the strength 
of the Persians at Thermopylae were true, there would not 
even have been room for the Persian troops to occupy the 
battlefield. Or, given the roads of the time, the last 
Persian troops would have just crossed the Bosporus when 
the first already had arrived in Greece), 


We will now take a look at errors in population, social 
and economic statistics from a historical perspective, 
What is named as "political statistics" overlaps in many 
respects with economic statistics and will be included 
by us in the latter. 
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POPULATION OR DEMOGRAPHIC STATISTICS 


It deals with births, deaths, marriages, fertility, and 
migration. Historical research in this area has dealt 
with e.g. size and changes in size, and mobility. 


When trying to determine past yearly changes in size of 
population, based on registry hold by national or local 
authorities, it has been proposed on one occasion that 
the agents of the authorities deleted the poorest people 
from the registry in years of bad economic situation: 
such people could then be temporarily relieved from 
taxes. A measure of the size of the population may in 
this case be looked upon as an economic indicator ! 


Later investigations of such problems have considered 
technical aspects of the registration such as substitu- 
tion of clerks, issuance of new rules for registration, 
local differences in accounting rules or inflexibilities 
in rules of cancellation, writing off etc. Figures on mi- 
gration were obtainable only in those cases when registra 
tion was supplemented by a continuous system of transac- 
tions, rather than exclusively based on periodical counts, 
Many errors in population registry have been assigned to 
the registrators' insatisfactory training in bookkeeping, 
dullness of work, or lack of motivation to register peo- 
ple who were regarded as "DEVIANT" RELIGIOUSLY OR POLI- 
TICALLY. Clerical misunderstandings included cases when 
stillborn children (dead at birth) were registered as 
dead but not as born. It is estimated that during the 

18 th century's Danmark and Norway about 5 to 10 % un- 
derregistration may be related to the numbers of born 

and dead people. 


Deviances between the situation in which the original 
data appeared, and the situations in which later such 
data are interpreted, occur when population dynamics 
must be inferred from available documentation, Registry 
on burials stands for mortality, clerical registry on 
marriage ceremony stands for marriages, and baptism 
stands for births. Summary tables of data WERE SYNTHE- 
TIZED from partial tables preventing the kind of checks 
possible through comparison with actually original data 
lists. Special inconsistencies were caused by the ear- 
lier habitude of not using "non-existence" or "absence" 
files for people who had disappeared without trace, 


SOCIAL STATISTICS 


It is usually concerned with either the SPECIFIC INDIVI- 
DUAL (language, education, family relationships, income, 
property), or with SOCIAL ACTIVITIES (such as health as— 
sistance, economic support, education, judicial system) 
or finally with data concerning the SOCIETY IN FUNCTION 
Sie unemployment, cost of living, salary trends, hou- 
sing). 


’ 


Errors in such statistics could in some cases be traced 
back to data-collection forms which were changed for 
the purpose of certain kinds of improvements, (such as 
decreasing misunderstandings in the process of filling 
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the forms) at the cost of destroying the possibility of 
comparing data from successive periods of time. "Langua- 
ge" could be in one case filled upon statement of the 
respondent while in another case it was the registrator's 
own opinion. "Profession"could be dependent on the kind 
of branch - industry, or alternatively on the content 

of the work - in some other sense. 


ECONOMIC STATISTICS 


POLITICAL STATISTICS, specifically is said to deal with 
national and local financial statistics, with elections 
(including voters, elected, and press). Specific pro- 
blems arise because of the SECRECY of certain financial 
data, the earlier non-existence of fiscal unity in fi- 
nancial transactions, difference in currencies. 
Special pitfalls come e.g. from the use of files on 
national revenues from taxation for the purpose of in- 
ferring the distribution of income and property. 


ECONOMIC STATISTICS,properly defined, is said to deal 
with PRODUCTION, LABOUR, and CAPITAL as descriptors of 
the economic situation. It is found that original data 
having LEGAL IMPORTANCE (such as proof of property) was 
the one that is most carefully cotiserved, It is also 
fouhd that those doctments which were most suited for 
quantification offered pitfalls because of NON-COMPARABI- 
LITY between successive periods of time, or because they 
had low relevance for the purpose on hand: 


In agricultural statistics, figures on cultivated areas 
were affected by errors because of inconsistencies in 
data-collection from one period to the next, or because 
of shifting definitions which were hidden by the AGGRE- 
GATION OF FIGURES prior to the analysis. Estimates on 
volume of harvests were affected by variations of money 
value, since original documents evaluated harvests on 
the basis of the at-the-time actual values. In moder 
statistics, special controls are made through individual 
interviewing of sampled farmers. 


In foreign trade statistics the original data may be 
obtained from Customs! files on import and export. 
Control of smuggle's effect on the figures is performed 
through comparison between the files of different Customs 
stations or between the files of export and import firms. 
Foreign trade value figures were inferred from quantity 
figures since Customs duties were related to quantities, 
The values shown in Customs files were determined through 
a central or local estimate, or through a request of da- 
ta from the exporter-importer, leading to inconsistencies 
about whether the value referred to was at sending or at 
destination. Land of origin was often found to have been 
equated to last land touched at,prior to arrival, Land of 
destination was in an analog way erroneously equated to 
first land touched at, after departure. 


Statistics on handicraft and industry was plagued by in- 
consistent classifications, resistance by respondents to 
furnish the requested information, and uncontrolled data- 
collection procedures. 
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Statistics on prices became necessary for national au- 
thorities when taxes "in natura" were to be evaluated 
in money or when foreign trade quantities were to be 
translated to balance of payments. Prices may be infer- 
red from private bookkeeping, and from price tariffs or 
quotations whose interpretation is strongly dependent 
upon the particular method of calculation. 


Ambitious data-collection was possibly associated with 


great volume of collected data, but also with loose 
rules and control. 


(B. Schiller & B. Odén, 1970) 


SUGGESTIONS FOk FURTHER Alli 


METHODS FOR SYSTEMS ANALYSIS 


We already mentioned in chapter 5 the need to complete 
the structure of an elementary message (in the Langefors' 
sense, 1968b,p.183) with the ERROR of the measure as a 
characterization of the measurement or nbservation pro- 
cess that produced the particular value. We also mon- 
tioned the need to include in the Langefors'! precedonce 
analysis (1968b, p-67) some "redundant" precedents along 
the lines of our proposal, in order to allow computation 
of error. We will now illustrate particularly this last 
point with a simple example of systems analysis applied 
to the description of data-processing for a Car-repair 
shop. We shall use the lately developed methods for 
drawing of precedence graphs, extended from M,Lundebere's 
illustration (1970,p.180) of Lagefors' ideas. 
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A detailing or "amplification" of process 3 leads to 
the following partial enlargement of the previous figure 
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An interesting implication of our paper which we suggest 
as object of further research is the possibility of re- 
garding 31A essentially as the same thing as 3A, and 315 
as the same as 3B. They both are computed by means of 
certain rules or measurement processes and their relation 
could be used for computation of error of the cost esti- 
mate. This amounts to recognizing that the fundamental 
nature of data-processing is to predict: Accnrding to 
our proposal the enlargement of process 3 in the second 
figure has simply introduced the "control observation" 
of an independent observer, the customer, who is allowed 
to negotiate on the magnitude of 31A and 31B. 


The information sets 31A and 31B, then, correspond to 

the information 5A in figure 4,10, while further analysis 
of the figures would possibly uncover the nature of the 
negotiation process and of the "objective" predicted or 
measured cost (invoice) in this simple case. It should 

be noted that similar analysis may be made on other in- 
formation sets of the graph for the repair-shop. As in 
the case of results of requirement generation in a manu- 
facturing plant, the replenishment order for parts to the 
shop, as computed by the data-processing system (5A in 
the enlarged second figure) is itself only an "estimate" 
which may be submitted for negotiations to the >:urcha- 
sing department, prior to being sent to the vendor. The 
information sets 3A and 3B are the only available des- 
cription of 34. It"exists" only in terms of descriptions. 


In order to generate further suggestions for research, 

we will explore the meaning of the graph-language for 
description of information processes. With a view on the 
group of information sets and processes DA, 3h. 55H 
of figure 4.10, or alternatively the group in the first, 
overview of data-processing for the repair shop, we 
abstract the following basic block 
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Some interesting questions arise if we ask ourselves 
what are the implications of 3A being "wrong". Then, 
using the figure we come to wonder whether the cause 

is wrong 1A, wrong 2A or wrong 3. If we concentrate 

on 3 we may ask ourselves how can the "actual" process 
be wrong. If the process is performed by a computer 
rather than in a human mind then we will say that the 
actual process was wrong because of a hardware failure. 
But "3" in the figure is a symbol that refers to some- 
thing, it is a description of something, it is informa- 
tion too. Does it describe what should have happened 
according to some other description (process specifica- 
tions) ? In such a case, what is the difference hetween 
3A and 1A ? Maybe 1A is the MATHEMATICAL description of 
the process, while 3 is the PHYSICAL (for instance in 
terms of electronics) description, 


This kind of reasoning takes us back to chapter 4 and 
to the Von NeumannGidstine approach that was one of tho 
basis for our proposal. Maybe 1A is the mathematical 
function and/or its translation to numeric-analytic 
terms, Perhaps then, process 3 is the physical transla- 
tion of the numeric-analytic-binary description to the 
electronics+physics description. In chapter 4 we named 
that such translation was only allowed because of the 
integration of the theory of physics with arithmetics, 
geometry etc, This is what permits in some sense to 
"test" the truth of the overall set 1A,3. The exten- 
sion of this reasoning to the rest of the figure sug- 
gests that 2A refers to the"concepts" and measurement 
of the state of such concepts or objects. 


It is obvious that we cannot discuss at one in terms of 
several different "models" like the mathematical, physi- 
cal etc, When the output is "wrong",however, or in order 
to test whether it is wrong we MUST in some way integra- 
te the partial models. This is perhaps the intent of 
H.Simon when stating that one poses a problem by giving 
the STATE description of the solution in the SENSED 
WORLD. The task is then to discover a sequence of PRO- 
CESSES in the ACTION WORLD that will produce the goals 
state from the initial state."Problem solving requires 
continual translation between the state and process des- 
criptions of the same complex reality." (1969,p.112) 


This relates the whole issue to the discussion by Marge- 
nau (1966, p.332-341) and his emphasis on that the dif- 
ference between primary,perceptory experience and the 
concept or constructs of the cognitive experience, is 
not merely semantic or linguistic (p.334-335). Actions 
of the instrumental or operational definitions relate 
our perceptory to our cognitive experience. In order to 
apply Simon's problem-solving phylosophy and Langefors' 
precedence-component analysis to social phenomenon one 
should investigate which are the possibilities that 
aggregations of information sets may result in the so- 
cial or psychological CONCEPTS equivalent to Margenau's 
cited eigenfunctions of quantum mechanics. Such possibili 
ties may also determine the applicability of precedence 
graphs to information processes in social environments. 
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We think that what was said justifies our restraint 
from drawing precedence~graphs in this study of quality 
of information, and it appears to be consistent with 
several remarks that we found in the literature. 


M.E.Maron, for example, (1964,p.15) cites Uspenskii as 
pointing out that "in order to create an information 
language for a given subject, one must have a theory 
of that subject; one must know about the things in 
question, about their properties, properties of those 
properties, and so forth." 


Churchman (1963,p.8) after stating that the observing 
mind partitions the class of meaningful assertions into 
those that describe the reality of the observed mind, 
and those that do not continues: "Often, without loss, 
the observing mind may take the set of assertions to be 
the reality of the observed mind rather than a descrip~ 
tion of it," 


Several authors describe how particularly in social en- 
vironments, the meaning of input, output, and process 
becomes vague or breaks down leading to false results. 
See J.:Schlesinger (1971,p.400), Gross (1971,p.367), 
Buckley (1967,p.54,168). Particularly worthy of medita- 
tion is the elaborate construction that H.H.Gonde & 
R,E.Machol attempt to explain in order to differentiate 
between INFORMATION versus MATERIAL systems(1957,p315). 
The kind of conceptual difficulties that it uncovers 
are characteristic of later positivistically oriented 
literature. The same is noticed in Chapanis (1951). 


The alternatives may be seen in terms of the Seneralizcd 
concepts of precedence and production as set forth by 
e.g. by Singer and found in the work of Churchman and 
Ackoff (See Ackoff, 1962, p. 156,172). It is possible 
that also A.Danielsson's approach gives some hints in 
this direction (1963). Much hard work is apparently 
required in order to translate such thoughts to guide- 
lines for systems analysis aimed at computerized apnli- 
cations. Perhaps some further hints will be contained 
in the latest book by Churchman (1971) which we have 
not yet available at the time this is written. 


A final note to suggest that mentioned possible develop- 
ments in methods of systems analysis may be relevant 
even for more technical software matters. In a personal 
communication (April 13,1971) Prof.David L.Parnas empha- 
sizes that the"interface" between subsystems or modules 
of software operating systems does NOT consist only of 
their input/output flows of data. In Parnas' own words, 
such interface consists also of the ASSUMPTIONS that the 
modules make on each other. This means that we can ac- 
tually change a module without changing others only to 
the extent that we do not affect the assumptions that 
the others assume (See information set 1A of figure 4.10). 
Thus, it appears that such assumptions may be considered 
as part of the factual content of boundary flows. 


A11.6 


HUMAN THINKING AND MANIPULATION OF SYMBOLS 


There is apparently something in common between much 
work going on in so-called artificial intelligence, 
simulation of human thinking, automatic problem solving, 
question-answering and fact-deducing systems, data mana- 
gement, quantitative linguistics, etc. This common thing 
is that they are regarded basically in terms of manipu- 
lation of symbols and that the writings about such to- 
pics are often divorced from any philosophical conside- 
rations or evaluation in terms of scientific method. 
"Symbols" and "manipulation" have apparently acquired 

a primary, self-sustained meaning that makes us wonder 
how it is related to e:gis Margenau's statement on the 
difference between primary-perceptory experience "P" and 
conceptual "C" cognitive experience (1966,p.335): 

"The difference between P and C is not merely semantic 
or linguistic; in fact language frequently obscures the 
difference. To note this is especially important for a 
fuller understanding of the method of science..." 


The implications of the above may be essential in order 
to understand the implications and THE DANGERS of sym- 
bol manipulation which is often believed to create know- 
ledge by manipulating a number of related "facts" plus 
their relationships. Knowledge and understanding is then 
seen as limited by our computer-programming capabilities 
as well as time-economic limitations of hardware, memo- 
ry,etc. Truth is often seen in terms of logic truth, 

as implied by the VALIDITY of deductive arguments or 

by TRUTH-FUNCTIONAL PROPOSITIONS. Validity is predica- 
ted of any deductive argument in which it is impossible 
to make the premises true while the conclusion is false. 
Truth-functional proposition is a compound proposition 
whose truth-value is completely determined by the truth 
values of its component propositions: thus, if we know 
the truth values of "p" and of "q" we can decide the 
truth value of "p implies q". One may, then, also con- 
ceive of the validity of CONDITIONAL PROPOSITIONS which 
are propositions of the form "if p then q" where p is 
the antecedent and q the consequent. “(For an introduc- 
tion see "Logic" in The Encyclopedia Americana, 1958). 


And so go the arguments which the reader will prohably 
relate to propositional or sentential calculus, to some 
of our reasoning in chapter 2, and to our discussions 

of truth relations among input, method, process and out- 
put. This appears to be the only possible discussion 
about "truth" that symbol-manipulation allows. The need 
for formalizing logic descriptions of complex reality, 
apparently lead to elaborate reconstructions like Car- 
nap's modal logic incorporating "necessary" to the "and", 
"or", "not" terms. Then we get also a "temporal logic" 
which incorvorates time. "Nuances in input" perhaps 
will be taken care by the "Theory of Fuzzy Sets",while 
in our approach we think they represent the scientific 
problem of measurement. 
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We urge the reader to think about the implications of 
how "decision-making in a fuzzy environment" (R.LE. 
Beliman & L.A.Zadeh, 1970) takes care of the the problem 
of quality of information: "Specifically, our contention 
is that there is a need for differentiation between 


used adjectives as large, small, substantial, signifi- 
eant, important, serious, simple, accurate, approximate, 
etc."(p.B-141), Compare this apnvroach with Ack»off's 
discussion of definition of red color (1962,p.160, 170). 


It appears to us extremely important that all research 
relying on logic realizes the role and limitstions of 
logic. "Logical consistency has no necossary priority." 
(Churchman, 1948,p.192). Further discussion of the limi- 
tations of logic are found in Kaplan (1964,p.3-18), 
Shapere (1966,p.42), Churchman (1968b,p. 31-36, 68, 108- 
119). It is not a question of "plugging the information 
into the machine." It is not either a question of, 

as a top business executive once said, considering items 
of information or "facts" as the material parts to be 
combined by the computer "tool", requiring therefore to 
be standardized to obtain low cost and quick delivery 
of machined information.See also Ferry(1971,p.211) and 
Churchman (1968b,p.200) on education as "production", 


In the same context we feel that a reat danger is re- 
presented by the so-called simulation of human thinking. 
To illustrate the following point consider the following 
statements. 


"A man, viewed as a behaving system, is quite simnle. 
The apparent complexity of his behavior over time is 
largely a reflection of the complexity of the environ- 
ment in which he finds himself." (Simon, 1969, p.25) 


"I do not propose here to develop in detail the idoa 
that the core of the behavior we call emotional derives 
from a mechanism for interrupting the ongoing stream of 
activity. However, this notion is consistent with a good 
deal of empirical evidence about the nature of emotion 
and provides an interesting avenue of exploration into 
the relation of emotion to cognitive activity. It‘ sup- 
gests that we shall not be able to write programs for 
computers that allow them to respond flexibly to a va- 
riety of demands, some with real-time priorities, with- 
out thereby creating a system that ina human, we would 
say exhibited emotion," (Simon, 1966,p.18) 


We suggest that the above two statements being capable 
to direct coming research in psychology and "artificial 
intelligence", be submitted to deep criticism. 
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We think that a starting point for such criticism may 
be found in the following cited work, 


"...we have fourid it expedient to refer, somewhat vague- 
ly, to another metaphysical principle which I shall call 
the requirement of simplicity and elegance. This has 
replaced to some extent the older Criterion of mechani- 
cal intuitability or visual clarity of explanatory cons- 
tructs. Great scientists have always been impressed by 
it, for they have sought simple laws, differential equa- 
tions of low order, spherical shapes for fundamental en- 
tities, small and where possible integral numbers for 
basic constants, and so forth. True, they did not al- 
ways get away with simple choices, and they replaced the 
naive maxim of the simplicity of nature by the methodo- 
logical injunction that simplicity must always be sought 
but ultimately distrusted. We shotild also note the lo- 
gical ambiguity of terms like simplicity and mathemati- 
cal elegance." (Margenau, p.340) 


Churchman (1968b,p.123) cites Ashby: "Science has, of 
course, long been interested in the living organism; 
but for two hundred years, it has tried primarily to 
find, within the organism, whatevet is simple...", 

In another context (p.97)Churchman remarks that "reason 
is not equivalent to what might be called calculation; 
for example, the processes carried on by a computer do 
not express all there is to be said about the concept of 
reason." And this may be related to Shapere's remark 
(1966,p.45) that "Wittgenstein warned that a great many 
functions of language can be ignored if language is 
looked upon simply as calculus..." 


It is difficult at this point to disregard the the idea 
that language as an expression of thought serves parti- 
cularly as a vehicle for a relationship to another per-~ 
son ! Additional criticism is implied, if read carefully, 
by U.Neisser's remark on the two phases of the popular 
(and we might add "and many scientists' ") attitude to- 
wards "artificial intelligence" (1963) "Yesterday's 
skepticism was based on ignorance of the capacities of 
machines; today's confidence reflects a misunderstanding 
of the nature of thought." 


Churchman, commenting on a possible attitude of the 
scientist writes "He acts as though he believed that 
people are information-processing machines. Indeed, in 
one area of scientific research, called "artificial 
intelligence", it is clearly assumed that intelligence 
is a type of information processing, and hence computers 
can think because we can get them to simulate the infor- 
mation processing of people. It's strange how often the 
eritics of artificial intelligence object to the wrong 
thing here; they are horrified at the suggestion that 
computers can think, whereas they should be horrified 

at the suggestion that people are information processors, 
(1968a,p.124). 
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After a passus where he shows that reduction of biology 
or psychology to physics may imply the disregard of all 
those problems that historically originated the sciences 
of biology and psychology (1968b,p.155), Churchman writes 
"...If science can construct realistic descriptions in 
a nonhuman manner, then the way it describes is really 
inhuman." (p.189). This may be the background of the 
apparent bankruptcy of the debate on"subjective"versus 
"objective" in the context of scientific method, as sug- 
gested in chapter 4 and by Churchman (1970,p.B-47). 

See also Churchman's discussion the "disinterested ob- 
server" and his emotional life (1968b,p.188-189) where 
he writes:"Some knowledge of the emotional life of eve- 
ry observer must be understood to make sure that the 
observer's world is separable from this other world." 
That same chapter on "Realism and Idealism" (p.171) is 
reconmerided to those who feel that these matters are 
"too theoretical" in the context of design and use of 
information systems, 


Th spite of our frequent citations, Churchman is not 
alone in the deep and intensive criticism. Wilensky, 
Downs and other contributors to Westin (ed.) (1971) 

put these viewpoints in a concrete and broad socio-poli- 
tical perspective. Shortly before his death, the"father" 
of cybernetics, Norbert Wiener gave a cybernetic inter- 
pretation of the dangers of narrow-minded use of comnu- 
ters (1960), and Johnson & Kobler expand those views 

in other terms in a later paper (1962). 


If we relate all the above to Margenau's remark(1966, 354) 
on simplicity of physics'invariances, and to Churchman's 
comments on the meaning of social invariances (1968a,p. 
224; 1968b,p.188) we think we have enough material for 
expressing the hypothesis that the search for"simplici- 
ty" in human matters may be dangerously biased. By this, 
we mean that if the search after so many expensive effor- 
ts turns out to be "successful" it may result in the 
discovery of constants and invariances which will fur- 
ther direct inquiry in inhuman ways, 


In a recent presentation of the work on a symbol-manipu- 
lation project we asked the lecturer what would be the 
applications of future advances of the project. We were 
informed that at a higher level of sophistication it 
might be useful for social planning and military appli- 
cations. Our next question was how the system would be 
tested, 


We did not get any answer;but we think that the question 
was not properly understood since symbol-manipulation 
has no "frame of reference" for discussing test and 
quality in the sense of our paper. We think, however, 
that such a question must be thoroughly answered if we 
are going to place any confidence in practical uses of 
such systems, 
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INFORMATION QUALITY AND LAW 


In the course of our paper we pointed to the importance 
of tying down the accuracy of information to particular 
humans. Research is necessary in order to refine the 
possibilities to define decision-makers. 


We want now to emphasize the possibility that all con- 
cern with security, secrecy, privacy, integrity, and 
confidentiality,may indeed be a subproblem of the ge- 
neral issue of quality. Maybe 90 % of all evils, in some 
sense, will derive from authorized use of information 
which is misused because of our limited knowledge of 

its quality, or of its right processing. Is it possible 
that the present concern with security etc. is a symptom 
of the'communication" approach to information systems ? 
As if the whole question amounted to guarantee that the 
information is"plugged" into the right mind with the 
GOOD JUDGEMENT ? The mind of an EXPERT ? 


We feel that our study suggests that the basic human 
right in the context of data-banks and information sys- 
tems is that EACH CITIZEN BE INFORMED ABOUT WHAT IS RE- 
CORDED AS30UT HIS OWN PERSON AND ABOUT WHO HAS USED THIS 
INFORMATION FOR WHAT PURPOSE, AND FINALLY THAT HIS OWN 
DISAGREEMENT ABOUT THE RECORDED INFORMATION 8E RECORDED 
AND ALWAYS RETRIEVED TOGETHER WITH IT. 


recommended step could be to implement control of the 
quality of that information by guaranteeing that each 
individual has the right to "sign-off" BEFORE informa- 
tion about him is given to somebody else. The sign- 
off would imply AT LEAST the right to negotiate in the 
sense developed in chapter 4. 


In this same context we want also to remind our discus- 
sion of Churchman's claim for the need at least of a 
system of legal controls so that the user of the infor- 
was convicted of burglary} (Churchman,1968b,p.196). 

As Buckley expresses it (1967,p.44) "individuals" are 
not discrete. What is discrete to the human observer's 
limited sensory apparatus is simply the physical »orga- 
nism, Or again Churchman (1968b,p.123): "From the point 
of view of synthesis, rather than analysis, the so-cal- 
led simple component, so clear to the heart of the em- 
piricist, is not simple at all. It is a component only 
because someone has had the imagination to construct the 
system of which it is a part; it is highly complicated 
because to show in what way it is a component at all is 
a long and tedious task. The issue is not whether the 
system exists; the issue is whether a component exists." 
Compare this with the discussion by Shapere (1966,p.47), 
Margenau (1966,p.335,343), and the concepts of"eigen- 
functions" and "field functions" in physics, 
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Thus, the problem is much more complicated than, as 
sometimes mentioned in the context of data management, 
"to guarantee that access to"data"be limited to those 
capable of using it correctly". Sometimes in organiza- 
tion-literature is mentioned that one important problem 
of "source-(of information) evaluation" is that of fal- 
sification of performance measurements. This view runs 
counter the spirit of our paper. We think that our pre- 
vious discussions of judgement etc. may be further sti- 
mulated by referring to the literature on LIES, versus 
FALSIFICATION, versus POOR JUDGEMENT (for example, Mor- 
genstern 1963, p.25,81). Maybe the denomination varies 
depending upon which organizational level they are com- 
mitted at ? Legal equality may indeed require judicially 
binding responsibilization of "decision-makers", 


The definition of decision-makers may also be a step 
towards control of abuses of statistical techniques for 
"predictifig" behavior in minority groups. "Dagens Nyheter" 
Dec.5 1970,Feb.6 1972,Febil11 1972 reports that for the 
purposes of research or"preventive" control,data are col- 
lected on people who e.g. live together without being 
matried, take tranquilizers, have tendency for alcoho- 
lism, have problems at work or with relatives, what lan- 
guage do they speak, whether the mother of a child lives 
together with the child's father,or whether she has 
interrupted earlier gravidity, whether the subject is 
sexually deviant, or suspect for infidelity in marriage, 
or whether he has particularly weak financial position. 
Instead of the original idea that the citizens control 
the public servants by means e.g. of an "ombudsman" the 
opposite may be happening. This fits, at least, 
into the pattern of several contributions to Westin (ed) 
(1971). See also Churchman (1968a,p.110). 


Is it conceivable to legislate about the legitimacy of 
particular statistical techniques for the purpose of 

"predicting" and preventing undesirable individual be- 
havior ? See our discussion on statistics in chapter 5, 


The recent emphasis on secrecy etc. in Sweden raises in- 
teresting questions if seen against Boguslaw's citations: 
"One of the most powerful tools available to a bureaucra- 
cy is secrecy... Perhaps the most significant implica- 
tion of bureaucratic organization is the tendency to con- 
vert all political problems into administrative problems." 
(1971,p.426). And Ferry writes: "Technology is already 
tilting the fundamental relationships of government, and 
we are only in the early stages." (1971,p.213) 

Churchman is also particularly critical of the orienta- 
tion of security and secrecy thinking and concludes, 

",.. one comes to recognize that our society has succum- 
bed to the vile disease of clogged information process- 
ing." (1968b,p.85) 


We have emphasized here public systems. Is the present 
kind of secrecy-effort a symptom of reducing quality to 
technical and positivistic terms ? Such approach deviates 
from the basic ideas of disagreement and negotiations, 
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We think that our study indicates some other important 
aspects of the privacy-integrity issue. Sometimes dis- 
tinction is made between STATISTICAL versus INTELLI- 
GENCE systems or between DATA-BANKS versus INFORMATION 
PROCESSING SYSTEMS, regarding the requirements and 
possibilities of privacy. 


In statistical systems privacy is sometimes conceived 
possible by means of aggregations of data on many 
people in such a way as to prevent identification of 
any particular individual. As E.M.Brooks (1971,n.53) 
and A.F.Westin (1971,p.307) point out, however, origi- 
nal stored data cannot be agrregated if they ineeed are 
to be cf any use for research or advanced social plan- 
ning. It is a basic scientific-conceptual requircment 
that attributes be kept related to the particular 9b- 
jects on which they were observed, If this is not done, 
the menace on privacy decreases but at the expense of 
increased menace on the quality of planning: the agere- 
gations may only help to answer certain questions 
but not other, and the individual who was rescued from 
an invasion of privacy may become victim of a self- 
fulfilling "prediction" of the behavior of the minn- 
rity group to which he is assigned. The problem of 
aon is also evident from the work of Verba 
1969), 


The second distinction between data-banks and infor- 
mation processing systems would suggest that the 
privacy-integrity problem is mote simple in data-banks 
since there we at least know that we have only true 
"facts" and the problem reduces to "AUTHORIZATION" in 
the sense of making sure that only the right people 
get the facts, In information processing systems we 
have the added problem of evaluating the quality of 
the processing. We hope that our study has made clear, 
however, that the issue is much more complicated than 
so and that there is no conceptual difference between 
data-banks and information-processing systems in this 
respect. |See the penetrating analysis by Churchman 
(1968a,p.113-116,119-125). 


Finally we want to remark that many of the above pro- 
blems are comnounded in the context of the recent 
projects to "computerize" law by classifying and sto- 
ring judicial data. See for example the swedish news- 
paper "Dagens Nyheter" of March 3, 1972 referring to 

a recent article in "Zeit". Political aspects of infor~- 
mation processing leading to self-pernetuating deci- 
sions, disregard of relevant undefined attributes etc,, 
are all matters which may be object of research in 
cooperation e.g. with historians. See Rokkan et al, 
(1969), the contributions to Westin (1971), Ch chman 
(1961,p.167), ackoff (1962,p.174) 
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SOME POSSIBLE IMPLICATIONS OF "COMMUNICATION" THINKING 


One of the most interesting examples of applying our 
proposal is the insight that figure 4.10 reduces to 
figures 2.1 or 2.2 (with the possible exception that 
computed error is not recorded in memory) , to the 
extent that the controlling »bserver is identical to, 
or depending on those who state the assumptions, spe- 
cify the action-inputs (operational definitions of 
measurements) or design the programs or system. 

It appears that in this case, the controlling observer 
may also be seen as setting the "standard" in a sense 
like that discussed in the sectinn on statistics when 
reviewing the paper by Hansen et al. Negotiations 
according to figure 4.11 are then not necessary or they 
are simplified since the controller may "enforce" the 
contract, or standard. 


The above insight is consistent with what is sometimes 
experienced in the context of simulation conceived as 
composed of model-making, decision-making, and model- 
analysis. These terms may roughly correspond to system- 
design and statement of assumptions including specifi- 
cations of inputs in terms of operational definitions 
(see "feedaback from 2A to 3 in figure 4.10), system 
operation or problem solving or implementation of de- 
signed programs in terms of "action-inputs" (see our 
reference to Danielsson's discussion, in chapter 4's 
section on"review in administrative processes"), and 
outputs tr be analyzed. What has been experienced in 
computer simulation problems, then, is that it is hot- 
ter to unify model-making and decision-making under 
one same responsibility, and isolate model analysis, 
rather than to unify model-making and model-~analysis 
leaving decision-making "“isolated",that is under sena- 
rate responsibility. The reason for this preference is 
that in the latter case the analysts have tendency to 
design too simple models since they are “easy to ana- 
lyze". 


In terms of our suggestion, "easy to analyze" means 
that it is easy to assign errors to input values and 
indirectly to the actions that correspond to the opera- 
tional specifications of the input measurements: reeall 
our references to the list of "source errors" tn our 
appendix A3. On the other hand, if model-making and 
decision-making are unified under same decision-maker, 
it may be easier to make a trade-off for allocation of 
error between model with specifications and assumptions, 
and input values, This appears alsn consistent with 
Churchman's statement on the organizational implica- 
tions of his proposed concept of reality,that we appli- 
ed to our approach to quality: the controlling obser- 
ver, decision-maker or researcher who "authenticates" 
the input or output data should have also the respon- 
sibility for the system design: the idea is the same, 
of facilitating trade-off, but Churchman's emphasis an- 
pears to be against the uncritical acceptance on "autho 
ritatively" given inputs like design parameters, 
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"facts" or operational specifications of input measu- 
rements (1963, p.12). Since there are in this context 
some problems of at least terminology, it should be 
interesting to have this interpretation substantiated 
by future research. Just to stimulate thinking and to 
illustrate possible correspondence of concepts, we pro- 
pose the following visualization of modeling traffic 
accidents with emphasis on traffic signs (roughly) : 


Input actions, i Decision- "Measured", noti- 

measured values , making, data iced traffic signs 

~------------------4--cgilection by Oriver wou 
: 

Design model, : Model - | "Be careful", Look 

program ,operatio- ! making, pre- around, place- 

nal_input_specific. dict __output__,ment_& layout —__ 
i ; 

Output, ' Model - ;Measured number 

control obser- ; analysis, /of accidents,and 

vation : Why error ? j investigation 


The idea, then, is that to the extent that the model 
maker is not the same responsible as the decision-ma~ 
ker, the model will turn out too simple in terms of 
naive exhortations "to be careful" or detailed speci- 
fications of the driver's actions in order ty make him 
notice traffic signs. To the extent that any accidents 
happen,the model analyzer who is the same as the res- 
ponsible for the model making, will conclude from his 
own investigation that the "cause" was (error alloca- 
ted to) that the driver did not follow the snecifica- 
tions which would have allowed him to notice the signs. 
The conclusion may be drawn that more severe police 
enforcement is desirable to make driver follow the 
specifications. 


If the mgdel-maker were the same as the decision-maker, 
he may realize the psychological constraints which pre- 
vent noticing and differentiating too-many, poorly de- 
signed or improperly placed traffic signs. When alloca- 
ting the error detected and investigated by the model 
analyzer he may choose between attempting to be more 
careful, change the layout and placement of signs, 

or question the assumptions of the operational speci- 
fications (their scientific-theoretical basis) that is 
the conditions under which he must notice the signs 
(too high traffic intensity, traffic planning etc.). 


The above is to be regarded siwply as an illustrative 

hypothesis for explaining the importance of having the 
design and operation of a system not under the control 
of analyzer for proper allocation of inaccuracies, 
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If not, inaccuracies may happen to be defined and com- 
puted in such a way as to be allocable to wrongly per- 
formed measurement processes, that is, "observation" 
errors, without questioning the basis for the operatio- 
nal specification of the measurement process, As sugges- 
ted by our discussion in chapter 1, this is related to 
the empiricist-positivist approach and may amount to not 
questioning the factual content of the input, being then 
equivalent to the "communication" approach discussed in 
the context of figures 2.1 and 2,2, 


Of particular interest in the context of such research, 
exploring the justification of the thoughts above, would 
be to analyze the scientific meaning of Pmery's state- 
ments on accuracy of estimates of input data for analy- 
tic or simulation models (Emery, 1969, p.97). Recall from 
app. no.l that Emery suggests that somebody MAKES STRUC- 
TURAL CHANGES IN THE PHYSICAL PROCESS BEING MODELED, 
whenever the INHERENT STATISTICAL VARIABILITY in the 
process precludes narrowing the range of ah estimate 

to within the region of relative insensitivity, What 
would this approach imply if applied to SOCIAL processes? 
The question is whether structural changes would be 
made in the social processes in order to make them fit, 
say, the models used for social planning. In such a ca- 
se one would regard the inherent statistical variability 
as the error, caused by random influences, Compare this 
concept of RANDOM ERROR with our discussion of systema- 
tic and random error when redefining quality in chapter 
five. 


The whole issue above bears intuitively an interesting 
relationship to J.Marschak's approach to the economics 
of information and his suggested conceptualization of 
"OBJECTIVE" versus"SUBJECTIVE" ranking of so-called in- 
formation structures (and instruments) according to 
their values. (See Marschak, 1959, p.86). Information 
structure is by him defined as the way in which an infor- 
mant or an information instrument PARTITIONS THE SET OF 
ALL POSSIBLE STATES OF NATURE (which he apparently con- 
siders as a given fact - the set). Information is by 
him defined as a set of all potential messages associa- 
ted with a given instrument (source or channel) of in- 
formation, 


Marschak goes on stating that whether a narticular in- 
formation structure yields a greater expected payoff 
than another structure depends in general on the PAYOFF 
function. Payoff is defined as that function of the 
ACTION and of the STATE of nature whose expected value 
is being maximized by the decision-maker. It is then 
noted that the ranking of information structures is a 
"SUBJECTIVE"*matter, inasmuch it depends on the usefulness 


of information for a given user. 
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Marschak then poses the question whether there are pairs 
of partitions (information structures) such that the 


It appears to us that it is an extremely interesting 
object of further research to compare the above approach 
with ours in this paper. We did not start from a given 
set of states of nature but we rather saw such states 
as the result of CODING AS MEASUREMENT. It appears to 
us that coding structures are equivalent to the parti- 
tions or information structures above. Coding schemes 
may also be seen as specification of alternatives. 

We can now relate this to what R. Boguslaw writes 
(A.F.Westin, editor, p.425): "...the exercise of force 
is related to the range of action alternatives made 
possible. The person with the ability to specify the 
alternatives...is the one who possesses power. And so 
it is that a designer of systems, who has the de facto 
prerogative to specify the range of phenomena that his 
system will distinguish, clearly is in possession of 
enormous degrees of power (depending, of course, upon 
the nature of the system being designed). It is by no 
means necessary that this power be formalized through 
the allocation of specific authority..." 


The most remarkable conclusion from the all above, 

is that the Marschak's approach then may sumzrest the ce- 
finitiscn :f"OBJECTIVE" ranking of values as a ranking 
which somebody obtains when, for example, he is forced 
to fit his view of the world as a sub-partition of 

the view established by somebody more powerful than him ! 


This hypothesis suddenly pushes us from the confortable 
realm of Shannon's mathematical theory of communication 
into sheer political science and gives added emphasis 

to what Churchman states (1961, p.167) "...the basis 

for a decision about the "next event" may very well have 
been already inherently established in decisions about 
the relevance and accuracy of the data." In this case 
what may be already established is the relevance and 
accuracy of the states of nature, information structure, 
and set of possible actions associated to payoffs. 
Compare these concepts with model or program, and opera- 
tional specification of measurement actions. 


We propose then that further research develops the above 
ideas and applies them to the analysis of a particular 

problem. It could be seen as a test of whether the "com- 
munication" type of research is biased in the sense that 
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encourages agreement at the expense of certain types of 
disagreements. Is it from this point of view motivated 
to analyze public reaction and social implications of 
information systems or data-banks in terms of similar 
experience from the implementation of telegraph, radio 
and telephone systems ? Are we right in suggesting that 
Marschak's approach offers no alternative to the spe- 
cification of quality of information ? See for instance 
his concept of "faulty information" as related to the 
concepts of external and internal environmental states 


(1959,p.89). 


Consider the following concrete illustration sugrosted 
by our own experience. CODING STRUCTURES for input to 
manufacturing information systems may tend to grow in 
a disordered way. Immagine that a CODING DECISION , 
that is, like a decision on which code should be assi- 
gned to a particular part used in the manufactured pro- 
duct, is indeed a "description of the nature" of the 
part in terms of an implicit specification of how its 
attributes or properties should be data-processed. To 
the extent that this is so, the human coder may feel 
the need to be assisted by a "decision-table" (of the 
type used for computer programming) since each coding 
decision tends to look like an alternative outcome out 
of a complex decision-table. 


Coding under such circumstances is no more a reasonably 
simple determination of an attribute or property of an 
object, class or event. Objects and events loose identi- 
ty as in the case of weak or non-existent theory buil- 
ding. Coding instructions resemble more and more a se- 
ries of operational (instrumental) definitions instruic- 
ting the human coder on how to measure the reality 
structured by the information system.(For details refer 
back to our example in chapter 3.) The coder or in- 
put agent or "decision-maker" is actually forced to 
follow the instructions if he is to describe and codec 
"correctly and objectively" the observed event. If the 
coder is dissatisfied with the coding structure he may 
meet economic-technical objections of the type described 
for example by R.Boguslaw (1971. . 421). In order to 
prevent total system breakdown, the coder may, with ti- 
me, have to follow more and more complex and detailed 
coding instructions that require, in fact, that the 
coder implicitly deseribes in detail the nature and or 
der of oue processing sequence (out of the set of se- 
quences allowed by the system), Theo system then proces- 
ses the input. 


Does this description fit both the material of chapter 

3 and of the paragraphs above ? Does this situation in 
some sense imply that the system"predicts" ex-post bv 
requiring that the input bears with itself much rele- 
vant information ? What are the implications for more 
complex information systems for public planning ? 
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Important aspects of the broad coding problem are cove- 
red by Oettinger (1971, p-250) and by Boguslaw (1971, 
p.419). Which possibilities exist to build into the 
system features for detection of poor coding structures ? 
Do such possibilities meet the criteria for meaningful 
operational definitions as implied for example by Ackoff 
(1962,p.146), Churchman (1948,p.112), Margenau (1966, 
p-336), Shapere (1966, p.44), Northrop (1947,p.126) ? 


It should be noted that a meaningful operatinonalism, must 
be tied down to some theory or equivalently to some 
committment (Morgenstern 1963,p.304; Churchman 1961, 
p.344). This is what allows specification of requirements 
as when one specifies the required characteristics of 

an electric motor: such specification is possible becau- 
se we have a meaningfully operationalized theory of phy- 
sics;and it is naive to believe that one can specify the 
required information system without having a theory on 
the subject matter of the system. 


As Buckley suggests (1967, p-92-93) committments and the- 
ories require a common acceptance and agreement on con- 
cepts, (probably related to the fact that one cannot defi- 
ne information as independent of the subject on whom it 
acts; communication may be regarded as an extension of 
the process wnereby one organism attempts to influence 
another organism; see Buckley 1967,p.49,54). This may 

be the reason why the NAMING OF DATA-ELEMENTS OR TERMS 

is a so important aspect of the"DATA-MANAGEMENT" ur. 
blam (see CD, 1970; IBM Form SC20-8096) in appendix Al. 
It may, therefore, also be naive to expect that data- 
management can be accomplished without having disagreec- 
ment and negotiation built into the system design. The 
reader will recall that our proposal in chapter 4 puts 
emphasis orn such features, If our understanding is right, 
we have reasons to expect that alternative implementations 
of data-banks and information systems on a national ba- 
sis will meet immense difficulties in the above respect. 


Under such circumstances WHAT ARE THE IMPLICATIONS OF 
"FAILING IN MANAGING THE DATA" 2? Are there any social 

and political implications ? Since the positivistically 
oriented literature does not recognize the impact of 
these issues in systems design and operation, it may be 
legitimate to ask for more precise nperational defini- 
tions for all those terms like distortion, absorntion, 
screening, condensing, sampling, compiling, aggregating, 
compression, filtration, amplification, ete. of informa- 
tion that is said to occur in business and social orpa- 
nization structures. And there are some highly political- 
economic applications of positivistic thinking: an exam- 
ple may be 0.E, Williamson's comments and conclusions, 

in the context of antitrust, about the beneficial effects 
of private multidivisional organizations (1970,p.178). 
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Several important contributions to the interplay be- 
tween information, economics, politics and sociology 
may be found in the August 1970 issue of "Management 
Science". See especially the comments by J.F, Collins. 
The whole issue dealt with urban management problems, 
mostly, in its relation to information systems, 

See also parts 3 and 4 of A.F. Westin (ed.) (1971), 
especially the contributions by Gross and by Boguslaw 
but also others like Ferry, Wilensky, Downs, and Hons. 
A dissertation by G.D. Brewer about management of ci- 
ties and information systems (1970) shows the immense 
complexity of the problem and the immense naiveté of 
the expensive and fashionable "simulation of society" 
etc. As we earlier mentioned, Churchman summarizes 
many political matters (1968a, p.40,45,90-94,100,159, 
ra 1968b, see index) and ethical ones (1970; 1968b, 
part 3). 


D.T.Campbell from a different »voint of view, analyzes 
many important political realities and refers to "sn- 
cially relevant data-banks" in a paper from 1969. 
W.Buckley (1967,p.173) summarizes a cybernetic inter- 
pretation of social and political problems. Swedish 
readers find in Ekecrantz (1971,1972) some extensi- 
ve discussions of the relation between information and 
sociology: his views may be regarded as politically mi- 
litant and therefore we looked for opposing views that 
would give a more complete image of the state of the 
debate in the country. We were not able, however, to 
find any such alternative views. This reminds us of 
Westin's experience in U.S.A. : 


"Interestingly, I have not found any treatment of in- 
formation technology in the writings of the American 
radical-right. They may simply take it for granted that 
computer technology is tightening the hold of a 
"pro-communist conspiracy" in business, government, and 
the intellectual community. Or, they may see information 
technology as a minor element in the larger moral con- 
frontation between their poles of "godless communism" 
and "american values ", In any event, I have found no 
radical-right commentaries to include in this section 

on the larger setting of advanced technology in democra- 
tic society." (1971,p.151) 


An interesting object of research, in Sweden, would be 
to investigate the implications of the non-existence of 
such a debate in the country. 
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In reading this paper, it is justified to question the 
scieritific method and the exposition of our own work, as 
a basis for confidence in our conclusions. 


Because of the nature of information, and because of the 
large scope of, particularly, public information systems, 
we want to see our own work in the context of the general 
issue of the management of inquiry. A summary on this 
issue is presented, for example, by F. Betz (1971). 


This leads us to recognize the fundamental considerations 
which first arise when regarding professional control or 
scientific methodology as decision activities: the kinés 
and extent of agreement which determines scientific 
judgements. In reviewing classifications of different 
modes of scientific emphasis and evaluation, that is, of 
decision methods of institutional science, we felt that 
the most appropriate mode for this study is the one that 
eek ie names as NONCONVENTIONAL, NONFORMAL, DEDUCTIVE 
1961). 


Without going into further details here, we will voint out 
that this mode implies, for example, that the agreement 
leading to scientific judgements, i.e. conclusions accep- 
ted by a disciplinary group of scientists, is not depen- 
ding on the acceptance of any conditions or rules for 
membership in the group. Furthermore, the emphasis of the 
group is not on the study and awareness of inferential 
rules: it is felt that attempts to formalize may imply 
premature methodological commitments, as suggested by 
some literature mentioned in appendix All. And finally, 
the presentation of the material is in "essay" form and 
it is not essentially an inductive generalization on a 
report of empirical data: factual support is only one of 
the basis for acceptance of principles or postulates. 


In order to meet the questions raised by e.g. the material 
reviewed in appendixes Al, A2, and All, we attempted to 
give to our work a stronger methodological basis. Thus, 

we also tried to satisfy several of the requirements for 
form and content in conceptual and operational definitions 
(Ackoff ,1962; Churchman,1948). We have also relied on 
extensive citations, sometimes from more summarizing lite- 
rature. 


Our whole study draws upon a large body of literature 
whose authors we acknowledge and thank for having been 
able to edit, translate, or cite the contents. Our whole 
study, however, may be seen as essentially based on: 


1. Shewhart (1939) who ties the study down to the concrete 
and well-established realm of manufacturing, physics, 
and statistical method. 

2. Churchman (1948 and 1961) who extends Shewhart's insight 
into other areas of activity and relates the whole to 
the developments of scientific method. 

3. Morgenstern (1963) who on the basis of extensive expe- 
rience furnishes a valuable testimony of the importance 
of accuracy in economics, and clearly illustrates the 
limitations of information-processing. 
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We feel that Churchman's summary of his work up to about 
year 1968, as presented in "Challenge to Reason" (1968b), 
provides a rough theoretical frame for both the above 
literature and this paper of ours. We expect that this 
integrating function will also be possible in terms of 
Churchman's latest book "Design of Inquiring Systems" 
(1971) which we have not yet available at the time this 
is written, 


The reader may find that it is remarkable that our study 
relies so heavily on Churchman's work. We felt that the 
remarkable thing was to notice,after several months of 
fruitless study, that his work for the first time allowed 
us to discuss the quality of information in information 
systems: Other literature does not even permit to frame 

a statement of the problem ! 


Our reliance on Churchman's work might be a serious weak- 
ness of our study if it implied that we have relied on the 
ideas of one only "expert", We think, however, that Church- 
man is one of the few "experts" related to operations- 
research and information-systems who has indeed bothered 

to pay due attention to various past and contemporary 
scientific-philosophical contributors. This is a far cry 
from the individual systems-analyst who, after some fifteen 
years of professional experience with computer systems 
combines his ideas with those of other peers, puts it down 
in a book, and then claims to have created a novel "philo- 
sophy" of data-processihg and organizational control: 

The implications of this image appear well captured by 
Margenau (1966) in discussing the philosophical neutrality 
of newer branches of science in Westerm Nations. Computer 
science is not alone: what Margenau says may be as well 
applicable to, say, psychology as applied to validation of 
the accuracy of testimony in judicial contexts, 


Because of the importance of Churchman's work for our study 
we have looked for the strongest possible criticism on it. 
Radnitzky (1970) and Kyburg (1962) attribute to Churchman 
viewpoints most of which appear explicitly contradicted in 
most of his writings. In general we feel that the criti- 
cism should be based on a deeper familiarization with his 
work. In particular, a proper understanding of "Prediction 
and Optimal Decision" (1961) is enhanced by a prior reading 
of "Theory of Experimental Inference" (1948). 


For a further appreciation of the criticism against Church- 
man we deem it valuable to compare his exposition of the 
philosophy of science with Kyburg's own in a recent book 
(1968). We recommend also Shapere's discussion of meta- 
scientific and formal-logic approaches (1966), and Ackoff's 
criticism of the so-called general systems theory (1964). 
We feel that a methodologically justified use of system- 
concepts requires a much deeper understanding of the possi- 
ble meaning of systems,as probably presented by Churchman 
himself in his latest book (1971) or as found in the text 
and references of Mason (1969), Mitroff et al. (1970), and 
Mitroff (1971). 


In summary, the criticism that we could raise against the 
basis of this study appears to be irrelevant for its pur- 
poses and has strengthened our confidence in the conclu- 
sions. 
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